C H A P T E R 1 |
DR on the Sun Enterprise 10000 System |
This chapter describes what dynamic reconfiguration (DR) is and what it can do for you. Then it describes the two models of DR that are available on the Sun Enterprise 10000 system.
DR software is part of the Solaris operating environment. With the DR software you can dynamically reconfigure system boards to safely remove them or install them into a system. You perform DR operations while the Solaris operating environment is running, and with minimum disruption to user processes that are running in the dynamic system domain (referred to simply as a domain in this document).
Minimize the interruption of system applications while installing or removing a board
Disable a failing device by removing it from the domain, before the failure can crash the operating system
Display the operational status of boards in a domain
Reconfigure a domain while the Solaris operating environment continues to run in the domain
If a system board is being used by a domain, you must detach it before you can power it off and remove it. After a new or upgraded system board is inserted and powered on, you can attach it to the domain.
You can perform DR operations from the system service processor (SSP) by using the Automated DR (ADR) commands: addboard (1M), moveboard (1M), deleteboard (1M), and showusage (1M).
This section contains descriptions of general DR concepts that pertain to the Sun Enterprise 10000.
For a device to be detachable, it must conform to the following items:
Critical resources must be redundant or accessible through an alternate pathway. CPUs and memory banks can be redundant critical resources. Disk drives are examples of critical resources that can be accessible through an alternate pathway.
Some boards cannot be detached because their resources cannot be moved. For example, if a domain has only one board, that board cannot be detached. A board is not detachable if it controls the boot drive.
If there is no alternate pathway for a board, you can:
Add a second path to the device through a second board so that the board can be detached without losing access to the secondary disk chain.
Note - If you are unsure whether a device is detachable, consult your Sun service representative. |
During the unconfigure operation on a system board with permanent memory (OpenBoot PROM or kernel memory), the operating environment is briefly paused, which is known as operating environment quiescence . All operating environment and device activity on the domain must cease during this critical phase of the unconfigure operation.
Before it can achieve quiescence, the operating environment must temporarily suspend all processes, CPUs, and device activities. If the operating environment cannot achieve quiescence, it displays the reasons, which can include devices that cannot be paused by the operating environment. The conditions that cause processes to fail to suspend are generally temporary.
Execution threads and real-time processes do not affect quiescence.
When DR suspends the operating environment, all of the device drivers that are attached to the operating environment must also be suspended. If a driver cannot be suspended (or subsequently resumed), the DR operation fails.
A suspend-safe device does not access memory or interrupt the system while the operating environment is in quiescence. A driver is suspend-safe if it supports operating environment quiescence (suspend/resume). A suspend-safe driver also guarantees that when a suspend request is successfully completed, the device that the driver manages will not attempt to access memory, even if the device is open when the suspend request is made. A suspend-unsafe device allows a memory access or a system interruption to occur while the operating environment is in quiescence.
DR 3.0 uses an unsafe driver list in the ngdr.conf file to prevent unsafe devices from accessing memory or interrupting the operating environment during a DR operation. The unsafe driver list is a property in the ngdr.conf file, with the following format:
DR reads this list when it prepares to suspend the operating environment so that it can unconfigure a memory component. If DR finds an active driver in the unsafe driver list, it aborts the DR operation and returns an error message. The message includes the identity of the active, unsafe driver. You must manually remove the usage of the device by performing one, or more, of the following tasks.
Killing the processes using the deviceUnloading the driver by using the modunload (1M) command
Depending on the device, disconnecting the cables.
You can retry the DR operation after you have removed the usage of the device.
Note Note - If you are unsure whether a device is suspend-safe, contact your Sun service representative. |
There are two models of DR available for the Sun Enterprise 10000 system. DR model 2.0 is sometimes referred to as "legacy DR," and DR model 3.0 is referred to as "next generation DR." Only DR 3.0 runs in a domain running version 9 of the Solaris operating environment. The following table shows the different versions of the Solaris operating environment and the SSP software that are used with DR models 2.0 and 3.0:
Only one model of DR can run within a domain at a time. To check the version of DR that is running, use the following command (available only with version 3.5 of the SSP software): domain_status -m . Make sure to verify the DR model before you execute any DR commands. The following is an example of the domain_status (1M) output. The DR-MODEL column indicates which model is enabled
According to this output, domain A is running Solaris version 8 software (OS 5.8) with DR model 2.0 enabled; domain B is running Solaris version 8 software with DR model 3.0 enabled; domain C is running Solaris version 7 software (OS 5.7) with DR model 2.0 enabled; and domain D is running Solaris version 9 (OS 5.9) with DR version 3.0 enabled.
Only certain commands are available in each model, and if you execute a command that is not supported, an error message appears on the console.
For more information about using DR 2.0, see the previous version of the Sun Enterprise 10000 Dynamic Reconfiguration (DR) User Guide (part number 806-7616-10). For more information about using DR 3.0, see the section DR 3.0 Procedures of this book.
The DR 3.0 model offers the following enhancements to DR 2.0:
DR 3.0 has a framework that offers better integration with applications, through the reconfiguration coordination manager (RCM).
DR 3.0 supports network multipathing using IPMP.
You execute DR operations from either of two places: from the system service processor (SSP) by using the SSP commands-- addboard (1M), moveboard (1M), deleteboard (1M), rcfgadm (1M), and showdevices (1M); or from the domain, using the cfgadm(1M) command.
To use multipathing on DR model 3.0 domains, run IPMP (the IP multi-pathing software provided with the Solaris 8 operating environment) and MPxIO software, included in Solaris Kernal Update Patches 111412-02, 111413-02, 111095-02, 111096-02, and 111097-02.
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.