The SunOS kernel dispatches processes by priority. The scheduler or dispatcher supports the concept of scheduling classes. Classes are defined as real-time (RT), system (SYS), and time-sharing (TS). Each class has a unique scheduling policy for dispatching processes within its class.
The kernel dispatches highest priority processes first. By default, real-time processes have precedence over sys and TS processes, but administrators can configure systems so that TS and RT processes have overlapping priorities.
The following figure illustrates the concept of classes as viewed by the SunOS kernel.
Figure 9-4 Dispatch Priorities for Scheduling Classes
At highest priority are the hardware interrupts, which cannot be controlled by software. The interrupt processing routines are dispatched directly and immediately from interrupts, without regard to the priority of the current process.
Real-time processes have the highest default software priority. Processes in the RT class have a priority and time quantum value. RT processes are scheduled strictly on the basis of these parameters. As long as an RT process is ready to run, no SYS or TS process can run. Fixed-priority scheduling enables critical processes to run in a predetermined order until completion. These priorities never change unless an application changes them.
An RT class process inherits the parent's time quantum, whether finite or infinite. A process with a finite time quantum runs until the time quantum expires or the process terminates, blocks while waiting for an I/O event, or is pre-empted by a higher-priority runnable real-time process. A process with an infinite time quantum ceases execution only when it terminates, blocks, or is pre-empted.
The SYS class exists to schedule the execution of special system processes, such as paging, STREAMS, and the swapper. You cannot change the class of a process to the SYS class. The SYS class of processes has fixed priorities established by the kernel when the processes are started.
At lowest priority are the time-sharing (TS) processes. TS class processes are scheduled dynamically, with a few hundred milliseconds for each time slice. The TS scheduler switches context in round-robin fashion often enough to give every process an equal opportunity to run, depending upon:
Its time slice value
Its process history (when the process was last put to sleep)
Considerations for CPU utilization
A child process inherits the scheduling class and attributes of the parent process through fork(2). A process's scheduling class and attributes are unchanged by exec(2).
Different algorithms dispatch each scheduling class. Class-dependent routines are called by the kernel to make decisions about CPU process scheduling. The kernel is class-independent, and takes the highest priority process off its queue. Each class is responsible for calculating a process's priority value for its class. This value is placed into the dispatch priority variable of that process.
As the following figure illustrates, each class algorithm has its own method of nominating the highest priority process to place on the global run queue.
Figure 9-5 Kernel Dispatch Queue
Each class has a set of priority levels that apply to processes in that class. A class-specific mapping maps these priorities into a set of global priorities. A set of global scheduling priority maps is not required to start with zero or be contiguous.
By default, the global priority values for time-sharing (TS) processes range from -20 to +20, mapped into the kernel from 0-40, with temporary assignments as high as 99. The default priorities for real-time (RT) processes range from 0-59, and are mapped into the kernel from 100 to 159. The kernel's class-independent code runs the process with the highest global priority on the queue.
The dispatch queue is a linear-linked list of processes with the same global priority. Each process is invoked with class-specific information attached to it. A process is dispatched from the kernel dispatch table based upon its global priority.
When a process is dispatched, the context of the process is mapped into memory along with its memory management information, its registers, and its stack. Then execution begins. Memory management information is in the form of hardware registers containing data needed to perform virtual memory translations for the currently running process.
When a higher priority process becomes dispatchable, the kernel interrupts its computation and forces the context switch, pre-empting the currently running process. A process can be pre-empted at any time if the kernel finds that a higher-priority process is now dispatchable.
For example, suppose that process A performs a read from a peripheral device. Process A is put into the sleep state by the kernel. The kernel then finds that a lower-priority process B is runnable, so process B is dispatched and begins execution. Eventually, the peripheral device sends an interrupt, and the driver of the device is entered. The device driver makes process A runnable and returns. Rather than returning to the interrupted process B, the kernel now pre-empts B from processing and resumes execution of the awakened process A.
Another interesting situation occurs when several processes contend for kernel resources. When a lower-priority process releases a resource for which a higher-priority real-time process is waiting, the kernel immediately pre-empts the lower-priority process and resumes execution of the higher-priority process.
Kernel Priority Inversion
Priority inversion occurs when a higher-priority process is blocked by one or more lower-priority processes for a long time. The use of synchronization primitives such as mutual-exclusion locks in the SunOS kernel can lead to priority inversion.
A process is blocked when it must wait for one or more processes to relinquish resources. Prolonged blocking can lead to missed deadlines, even for low levels of utilization.
The problem of priority inversion has been addressed for mutual-exclusion locks for the SunOS kernel by implementing a basic priority inheritance policy. The policy states that a lower-priority process inherits the priority of a higher-priority process when the lower-priority process blocks the execution of the higher-priority process. This places an upper bound on the amount of time a process can remain blocked. The policy is a property of the kernel's behavior, not a solution that a programmer institutes through system calls or interface execution. User-level processes can still exhibit priority inversion, however.
User Priority Inversion
The issue of user priority inversion and the means to deal with it are discussed in "Mutual Exclusion Lock Attributes" in Multithreaded Programming Guide.
Interface Calls That Control Scheduling
The interface calls described below control process scheduling.
Control over scheduling of active classes is done with priocntl(2). Class attributes are inherited through fork(2) and exec(2), along with scheduling parameters and permissions required for priority control. This is true for both the RT and the TS classes.
priocntl(2) is the interface for specifying a real-time process, a set of processes, or a class to which the system call applies. priocntlset(2) also provides the more general interface for specifying an entire set of processes to which the system call applies.
The command arguments of priocntl(2) can be one of: PC_GETCID, PC_GETCLINFO, PC_GETPARMS, or PC_SETPARMS. The real or effective ID of the calling process must match that of the affected processes, or must have superuser privilege.
This command takes the name field of a structure that contains a recognizable class name (RT for real-time and TS for time-sharing). The class ID and an array of class attribute data are returned.
This command takes the ID field of a structure that contains a recognizable class identifier. The class name and an array of class attribute data are returned.
This command returns the scheduling class identifier and/or the class specific scheduling parameters of one of the specified processes. Even though idtype and id might specify a big set, PC_GETPARMS returns the parameter of only one process. The class selects the process.
This command sets the scheduling class and/or the class-specific scheduling parameters of the specified process or processes.
Other interface calls
Returns the maximum values for the specified policy.
Returns the minimum values for the specified policy (see the sched_get_priority_max(3R) man page).
Updates the specified timespec structure to the current execution time limit (see the sched_get_priority_max(3RT) man page).
Sets or gets the scheduling parameters of the specified process.
Blocks the calling process until it returns to the head of the process list.
Utilities That Control Scheduling
The administrative utilities that control process scheduling are dispadmin(1M) and priocntl(1). Both of these utilities support the priocntl(2) system call with compatible options and loadable modules. These utilities provide system administration functions that control real-time process scheduling during runtime.
The priocntl(1) command sets and retrieves scheduler parameters for processes.
The dispadmin(1M) utility displays all current process scheduling classes by including the -l command line option during runtime. Process scheduling can also be changed for the class specified after the -c option, using RT as the argument for the real-time class.
The class options for dispadmin(1M) are: