C H A P T E R 1 |
Introduction to DR on Sun Fire 6800/4810/4800/3800 Systems |
The dynamic reconfiguration (DR) features described in this user's guide are specific to Sun Fire 6800, 4810, 4800, and 3800 systems using the Solaris 8 2/02 or Solaris 9 operating environment.
DR software is part of the Solaris operating environment. With the DR software you can dynamically reconfigure system boards and safely remove them or install them into a system while the Solaris operating environment is running and with minimum disruption to user processes running in the domain.
You can use DR to do the following:
The DR software has a command line interface (CLI) using the cfgadm command, which is the configuration administration program. The DR agent also provides a remote interface to the Sun Management Center 3.0 software.
The optional Sun Management Center 3.0 Update 1 software (and later versions), which is designed for these systems, provides features such as domain management, as well as a graphical user interface (GUI) to the cfgadm DR command line interface (CLI). If you prefer to use a GUI, use the Sun Management Center 3.0 software instead of the command line interfaces of the system controller software and the DR software.
To use the Sun Management Center 3.0 software, you must attach the System Controller board to a network. With a network connection, you can view both the command line interface and the graphical user interface. For instructions on how to use the Sun Management Center 3.0 software, refer to the Sun Management Center 3.0 User's Guide , shipped with the Sun Management Center 3.0 software. For instructions on how to connect the system controller to a network connection on the System Controller board, refer to your systems installation documentation.
This section contains descriptions of general DR concepts that pertain to Sun Fire 6800/4810/4800/3800 domains.
For a device to be detachable, it must conform to the following items:
Some boards cannot be detached because their resources cannot be moved. For example, if a domain has only one CPU board, that CPU board cannot be detached. If the boot drive does not have the failover feature implemented, the I/O board connected to it is not detachable.
If there are not multiple pathways for an I/O board, you can:
During the unconfigure operation on a system board with permanent memory (OpenBoot PROM or kernel memory), the operating environment is briefly paused, which is known as operating environment quiescence. All operating environment and device activity on the centerplane must cease during a critical phase of the operation.
Before it can achieve quiescence, the operating environment must temporarily suspend all processes, CPUs, and device activities. If the operating environment cannot achieve quiescence, it displays the reasons, which may include the following:
The conditions that cause processes to fail to suspend are generally temporary. Examine the reasons for the failure. If the operating environment encountered a transient condition--a failure to suspend a process--you can try the operation again.
When DR suspends the operating environment, all of the device drivers that are attached to the operating environment must also be suspended. If a driver cannot be suspended (or subsequently resumed), the DR operation fails.
A suspend-safe device does not access memory or interrupt the system while the operating environment is in quiescence. A driver is suspend-safe if it supports operating environment quiescence (suspend/resume). A suspend-safe driver also guarantees that when a suspend request is successfully completed, the device that the driver manages will not attempt to access memory, even if the device is open when the suspend request is made.
A suspend-unsafe device allows a memory access or a system interruption to occur while the operating environment is in quiescence.
An attachment point is a collective term for a board and its slot. DR can display the status of the slot, the board, and the attachment point. The DR definition of a board also includes the devices connected to it, so the term "occupant" refers to the combination of board and attached devices.
There are two formats used when referring to attachment points:
/devices/ssm@0,0:N0.SBx (for a CPU/Memory board) OR /devices/ssm@0,0:N0.IBx (for an I/O assembly) |
N0.SBx (for a CPU/Memory board) OR N0.IBx (for an I/O assembly) |
There are four main types of DR operations.
If a system board is in use, stop its use and disconnect it from the domain before you power it off. After a new or upgraded system board is inserted and powered on, connect its attachment point and configure it for use by the operating environment.
The cfgadm (1M) command can connect and configure (or unconfigure and disconnect) in a single command, but if necessary, each operation (connection, configuration, unconfiguration, or disconnection) can be performed separately.
Hot-plug boards and modules have special connectors that supply electrical power to the board or module before the data pins make contact. Boards and devices that have hot-plug connectors can be inserted or removed while the system is running.
I/O boards and CPU/Memory boards used in the Sun Fire 6800/4810/4800/3800 servers are hot-plug devices. Some devices, such as the peripheral power supply, are not hot-plug modules and cannot be removed while the system is running.
A state is the operational status of either a receptacle (slot) or an occupant (board). A condition is the operational status of an attachment point.
Before you attempt to perform any DR operation on a board or component from a domain, you must determine state and condition. Use the cfgadm (1M) command with the - la options to display the type, state, and condition of each component and the state and condition of each board slot in the domain. See the section Component Types for a list of the component types.
This section contains descriptions of the states and conditions of system boards (also known as system slots).
A board can have one of three receptacle states: empty, disconnected, or connected. Whenever you insert a board, the receptacle state changes from empty to disconnected. Whenever you remove a board the receptacle state changes from disconnected to empty.
A board can have one of two occupant states: configured or unconfigured. The occupant state of a disconnected board is always unconfigured.
A board can be in one of four conditions: unknown, ok, failed, or unusable.
This section contains descriptions of the states and conditions for components.
A component cannot be individually connected or disconnected. Thus, components can have only one state: connected.
A component can have one of two occupant states: configured or unconfigured.
Component is available for use by the Solaris Operating Environment. |
|
Component is not available for use by the Solaris Operating Environment. |
A component can have one of three conditions: unknown, ok, failed.
You can use DR to configure or to unconfigure several types of components. .
The Sun Fire 6800, 4810, 4800, and 3800 servers can be divided into dynamic system domains, referred to as domains in this document. These domains are based on system board slots that are assigned to the domains. Each domain is electrically isolated into hardware partitions, which ensures that an arbitrary stop in one domain does not affect the other domains in the server.
The domain configuration is determined by the domain configuration table in the platform configuration database (PCD), which resides on the system controller (SC). The domain table controls how the system board slots are logically partitioned into domains. The domain configuration includes empty slots and populated slots.
The number of slots available to a given domain is controlled by an available component list that is maintained on the system controller (refer to the System Management Services (SMS) 1.2 Administrator Guide for more information about the available component list. After a slot has been assigned to a domain, it becomes visible to that domain and unavailable and invisible to any other domain. Conversely, you must disconnect and unassign a slot from its domain before you can connect and assign it to another domain.
The logical domain is the set of slots that belong to the domain. The physical domain is the set of boards that are physically interconnected. A slot can be a member of a logical domain without having to be part of a physical domain. After the domain is booted, the system boards and the empty slot can be assigned to or unassigned from a logical domain; however, they are not allowed to become a part of the physical domain until the operating environment requests it. System boards or slots that are not assigned to a domain are available to all domains if the board is in the available component list for each domain. These boards can be assigned to a domain by the platform administrator. However, an available component list can be set up on the SC to allow users with appropriate privileges to assign available boards to a domain.
You must use caution when you add or remove system boards with I/O devices. Before you can remove a board with I/O devices, all of its devices must be closed and all of its file systems must be unmounted.
If you need to remove a board with I/O devices from a domain temporarily and then re-add it before any other boards with I/O devices are added or removed, reconfiguration is not necessary and need not be performed. In this case, device paths to the board devices will remain unchanged.
All I/O devices must be closed before they are unconfigured. If you encounter a problem with an I/O device, the following list can help you to overcome the problem.
Caution - Unmounting file systems may affect NFS client systems. |
Before you can delete a board, the environment must vacate the memory on that board. Vacating a board means flushing its nonpermanent memory to swap space and copying its permanent (that is, kernel and OpenBoot PROM memory) to another memory board. To relocate permanent memory, the operating environment on a domain must be temporarily suspended, or quiesced. The length of the suspension depends on the domain I/O configuration and the running workloads. Detaching a board with permanent memory is the only time when the operating environment is suspended; therefore, you should know where permanent memory resides so that you can avoid significantly impacting the operation of the domain. You can display the permanent memory by using the cfgadm (1M) command with the - v option. When permanent memory is on the board, the operating environment must find another memory component of adequate size to receive the permanent memory.
When permanent memory is removed, DR chooses a target memory area to receive a copy of the memory. The DR software automatically checks for total adherence. It does not allow the DR memory operation to continue if it cannot verify total adherence. A DR memory operation can be disallowed because the domain does not have enough available memory to hold the permanent memory.
DR lets you disconnect and then reconnect system boards without bringing the system down. You can use DR to add or remove system resources while the system continues to operate.
As an example reconfiguration of system resources, consider the following Sun Fire system configuration, as depicted in the diagram that follows: domain A contains system boards 0 and 2, and I/O board 7. Domain B contains system boards 1 and 3, and I/O board 8.
Note - Before performing DR operations, always make sure that the system complies with the constraints set forth in Limitations. |
To re-assign system board 1 from domain B to domain A, you can use the Sun Management Center software GUI. Or you can perform the following steps manually on the CLI in each domain:
1. As superuser, enter the following command on the command line in domain B to disconnect system board 1:
# cfgadm -c disconnect -o unassign N0.SB1 |
2. Then, enter the following command on the command line in domain A to assign, connect, and configure system board 1 in Domain A:
# cfgadm -c configure N0.SB1 |
The following system configuration is the result. Notice that only the way in which the boards are connected has changed, but not the physical layout of the boards within the cabinet.
For late-breaking news and patch information, visit the Solaris 8 web page at:
http://sunsolve2.Sun.COM/sunsolve/Enterprise-dr
The web site is updated periodically.
If you do not have access to this web site, ask your Sun service provider for assistance in obtaining the latest information.
System boards cannot be dynamically reconfigured if system memory is interleaved across multiple CPU/Memory boards.
Conversely, CompactPCI cards and I/O boards can be dynamically reconfigured whether memory is interleaved or not.
When a CPU/Memory board containing non-relocatable (permanent) memory is dynamically reconfigured out of the system, a short pause in all domain activity is required which may delay application response. Typically, this condition applies to one CPU/Memory board in the system. The memory on the board is identified by a non-zero permanent memory size in the status display produced by the cfgadm -av command.
DR supports reconfiguration of permanent memory from one system board to another only if one of the following conditions is met:
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.