Power Management Components
A device is power manageable if the power consumption of the device can be reduced when it is idle. Conceptually, a power manageable device consists of a number of power-manageable hardware units called components.
The device driver notifies the system of the existence of device components and the power levels that they support by creating a pm-components(9P) property in its attach(9E) entry point as part of driver initialization.
Most devices that are power manageable implement only a single component. An example of a single-component, power-manageable device is a disk whose spindle motor can be stopped to save power when the disk is idle.
If a device has multiple power-manageable units that are separately controllable, it should implement multiple components.
An example of a two-component, power-manageable device is a frame buffer card with a monitor connected to it. Frame buffer electronics is the first component [component 0]. Its power consumption can be reduced when not in use. The monitor is the second component [component 1], which can also enter a lower power mode when not in use. The combination of frame buffer electronics and monitor is considered by the system as one device with two components.
Multiple Power Management Components
To the power management framework, all components are considered equal and completely independent of each other. If this is not true for a particular device, the device driver must ensure that undesirable state combinations do not occur. For example, with a frame buffer/monitor card with a monitor attached to it, for each possible power state of the monitor (On, Standby, Suspend, Off) there are states of the frame buffer electronics (D0, D1, D2, D3) that are not allowed if the device is to work properly. If the monitor is On, then the frame buffer must be at D0 (full on), so if the frame buffer driver gets a request to power up the monitor to On while the frame buffer is D3, it must ask the system to bring the frame buffer back up (by calling pm_raise_power(9F)) before setting the monitor On. If the frame buffer driver gets a request from the system to lower the power of the frame buffer while the monitor is On, it must fail that request.
Power Management States
Each component of a device may be in one of two states: busy or idle. The device driver notifies the framework of changes in the device state by calling pm_busy_component(9F) and pm_idle_component(9F). When components are initially created, they are considered idle.
Power Levels
From the pm-components property exported by the device, the Device Power Management framework knows what power levels the device supports. Power level values must be positive integers. The interpretation of power levels is determined by the device driver writer, but they must be listed in monotonically increasing order in the pm-components property, and a power level of 0 is interpreted by the framework to mean off. When the framework must power up a device because of a dependency, it will bring each component to its highest power level.
Example 9-1 is an example pm-components entry from the .conf file of a driver that implements a single power-managed component consisting of a disk spindle motor. The disk spindle motor is component 0 and it supports 2 power levels, which represent stopped and spinning full speed.
Example 9-1 Sample pm-component Entry
pm-components="NAME=Spindle Motor", "0=Stopped", "1=Full Speed"; |
Example 9-2 shows an example of how Example 9-1 could be implemented in the attach() routine of the driver.
Example 9-2 attach(9E) Routine With pm-components Property
Example 9-3 shows a frame buffer that implements two components. Component 0 is the frame buffer electronics that support four different power levels. Component 1 represents the state of power management of the attached monitor.
Example 9-3 Multiple Component pm-components Entry
pm-components="NAME=Frame Buffer", "0=Off", "1=Suspend", \ "2=Standby", "3=On", "NAME=Monitor", "0=Off", "1=Suspend", "2=Standby", "3=On"; |
When a device driver is first attached, the framework does not know the power level of the device. A power transition may occur when:
The driver calls pm_raise_power(9F) or pm_lower_power(9F).
The framework has lowered the power level of a component because it has exceeded its threshold time.
Another device has changed power, and there is a dependency between the two devices. See "Power Management Dependencies".
Once a power transition has occurred or the driver has informed the framework of the power level, the framework tracks the current power level of each component of the device. The driver can inform the framework of a power level change by calling pm_power_has_changed(9F).
The system calculates a default threshold for each possible transition from one power level to the next lower level, based on the system idleness threshold. These default thresholds can be overridden using dtpower(1M) or power.conf(4). Another default threshold based on the system idleness threshold is used when the component power level is unknown.
Power Management Dependencies
Some devices should be powered down only when other devices are also powered down. For example, if removable-media devices such as CD-ROM drives or Zip drives are allowed to power down by themselves, functionality associated with their current state, such as the ability to eject a CD or to respond when a new Zip disk is inserted, may be lost.
One way to prevent a device from powering down independently is to make the device dependent on another device that is likely to remain powered on while its functionality is required. Typically, the device is made dependent upon a frame buffer, because a monitor is generally on whenever a user is utilizing a system.
The power.conf(4) file specifies the dependencies among devices. (A parent node in the device tree implicitly depends upon its children. This dependency is handled automatically by the power management framework.) You can specify a particular dependency with a power.conf(4) entry of this form:
device-dependency dependent_phys_path phys_path |
where dependent_phys_path is the device that is kept powered up (such as the CD-ROM drive) and phys_path is the device whose power state it depends on (such as the frame buffer).
Because it would be burdensome to add an entry to power.conf for every new device plugged into the system, another syntax enables you to indicate dependency in a more general fashion:
device-dependency-property property phys_path |
Such an entry mandates that any device that exports the property property will be dependent upon the device named by phys_path. Because this dependency applies especially to removable-media devices, /etc/power.conf includes the following line by default:
device_dependent-property removable-media /dev/fb |
to signal that any device exporting the removable-media property will not be powered down unless the console frame buffer is also powered down.
For more information, see the power.conf(4) and removable-media(9P) man pages.
Automatic Power Management for Devices
If automatic power management is enabled by dtpower(1M) or power.conf(4), then all devices with a pm-components(9P) property automatically will be power managed. After each component has been idle for a default period, it will be automatically brought to its next lowest power level. The default period is calculated by the power management framework to set the entire device to its lowest power state within the system idleness threshold.
Note - By default automatic power management is enabled on all SPARC desktop systems first shipped after July 1, 1999. This feature is disabled by default for all other systems. To determine if automatic power management is enabled on your machine, refer to the power.conf(4) man page for instructions.
dtpower(1M) or power.conf(4) may be used to override the defaults calculated by the framework.
Device Power Management Interfaces
A device driver that supports a device with power-manageable components must notify the system of the existence of these components and the power levels that they support by creating a pm-components(9P) property. This is typically done from the driver's attach(9E) entry point by calling ddi_prop_update_string_array(9F), but may be done from a driver.conf(4) file instead. See the pm-components(9P) man page for details.
Busy-Idle State Transitions
The driver must keep the framework informed of device state transitions from idle to busy or busy to idle. Where these transitions happen is entirely device-specific. The transitions from idle to busy and from busy to idle depend on the nature of the device and the abstraction represented by the specific component. For example, SCSI disk target drivers typically export a single component, which represents whether the SCSI target disk drive is spun up or not. It is marked busy whenever there is an outstanding request to the drive and idle when the last queued request finishes. Some components are created and never marked busy (components created by pm-components(9P) are created in an idle state).
The pm_busy_component(9F) and pm_idle_component(9F) interfaces notify the power management framework of busy-idle state transitions. The syntax for pm_busy_component(9F) is:
int pm_busy_component(dev_info_t *dip, int component); |
pm_busy_component(9F) marks component as busy. While the component is busy, it will not be powered off. If the component is already powered off, then marking it busy doesn't change its power level. The driver needs to call pm_raise_power(9F) for this purpose. Calls to pm_busy_component(9F) are cumulative and require a corresponding number of calls to pm_idle_component(9F) to idle the component.
The syntax for pm_idle_component(9F) is:
int pm_idle_component(dev_info_t *dip, int component); |
pm_idle_component(9F) marks component as idle. An idle component is subject to being powered off. pm_idle_component(9F) must be called once for each call to pm_busy_component(9F) in order to idle the component.
Device Power State Transitions
A device driver can call pm_raise_power(9F) to request that a component be set to at least a given power level. This is necessary before using a component that has been powered off. For example, a SCSI disk target driver's read(9E) or write(9E) routine might need to spin up the disk before completing the read or write, if the disk has already been powered off. pm_raise_power(9F) requests the power management framework to initiate a device power state transition to a higher power level. Normally, reductions in component power levels are initiated by the framework. However, a device driver should call pm_lower_power(9F) when detaching, in order to reduce the power consumption of unused devices as much as possible.
Powering down can pose risks for some devices. For example, some tape drives damage tapes when power is removed; likewise, some disk drives have a limited tolerance for power cycles, since each cycle results in a head landing. Such devices should export the no-involuntary-power-cycles(9P) property to notify the system that all power cycles for the device must be under control of a device driver. This prevents power from being removed from a device while the device driver is detached, unless the device was powered off by a driver's call to pm_lower_power(9F).
pm_raise_power(9F) is called when the driver discovers that a component needed for some operation is at a power level less than is needed for that operation. This interface arranges for the driver to be called to raise the current power level of the component at least to the level specified in the request. All the devices that depend on this device are also brought back to full power by this call.
pm_lower_power(9F) is called when the device is detaching, once access to the device is no longer needed. It should be called for each component to set each component to its lowest power so that the device uses as little power as possible while it is not in use. The syntax for pm_lower_power(9F) is the same as that for pm_raise_power(9F).
pm_power_has_changed(9F) is called to notify the framework when a device has made a power transition on its own, or to inform the framework of the power level of a device, for example, after a suspend-resume operation. The syntax for pm_power_has_changed(9F) is the same as that for pm_raise_power(9F).