C H A P T E R  2

Using DR 3.0 Model

This chapter contains information about using DR model 3.0 on a Sun Enterprise 10000 system that is running version 3.5 of the SSP software and one of the following versions of the Solaris operating environment: Solaris 8 10/01, Solaris 8 02/02, or Solaris 9.

DR model 3.0 uses the domain configuration server, dcs (1M), to control DR operations. DR 3.0 includes Automated DR (ADR) commands such as addboard (1M), deleteboard (1M), and moveboard (1M). DR 3.0 also includes the following commands:


Automatic DR

Note - For more information about using DR model 2.0, see the previous version of the Sun Enterprise 10000 Dynamic Reconfiguration (DR) User Guide (part number 806-7617-10).



Automatic DR enables an application to perform DR operations without requiring user interaction. This ability is provided by an enhanced DR framework that includes the reconfiguration coordination manager (RCM) and a system event facility called sysevent. The RCM enables application-specific loadable modules to register with it for callbacks. The callbacks perform preparatory tasks before; error recovery during; and clean-up after a DR operation. The sysevent facility enables applications to register for notification of system events. The automatic DR framework interfaces with the RCM and with sysevent to notify applications to give up resources automatically prior to unconfiguring them, and to capture new resources as they are configured into the domain.



Note - Automatic DR is a different feature than Automated DR (ADR)



For more information about RCM, refer to the Solaris 8 System Administration Supplement (part number 806-7502-10) in the Solaris 8 10/01 Update Collection.


Enhanced System Availability

The DR feature enables you to hot-swap system boards without bringing the system down. DR is used to unconfigure the resources on a faulty system board from a domain so that the system board can be removed from the system. The repaired (or replacement) board can be inserted into the domain while the Solaris operating environment is running. DR then configures the resources on the board into the domain.


DR and I/O Boards

You must use caution when you add or remove system boards with I/O devices. Before you can remove a board with I/O devices, all its devices must be closed, and all its file systems must be unmounted.

If you need to remove a board with I/O devices from a domain temporarily and then re-add it before any other boards with I/O devices are added, reconfiguration is unnecessary. In this case, device paths to the board devices remain unchanged. However, if you add another board with I/O devices after the first was removed, then re-add the first board, reconfiguration is required because the paths to the devices on the first board have changed.


Sun Enterprise 10000 Domains

The Sun Enterprise 10000 system can be divided into domains that contain system boards; and the components such as CPUs, memory chips, and CompactPCI cards that are connected to the boards. Each domain is electrically isolated into hardware partitions, which ensures that a hardware or software failure in one domain does not affect the other domains in the system.


DR 3.0 Procedures

This section contains procedures that describe how to use the DR 3.0 commands. The following procedures are included:

Showing Platform Information

Before you attempt to add, move, or delete a board to or from a specific domain, use the domain_status (1M) command to determine the domain name and board number.


procedure icon  To Show Platform Information

1. Use the domain_status (1M) command to obtain the domain information.

% domain_status -m

Using the domain_status with the -m option command (in SSP version 3.5 only) displays the domain name, the DR model, and the number of the boards in the domain, as in the following example.

% domain_status -m
DOMAIN     TYPE                     PLATFORM   DR-MODEL   OS   SYSBDS
A          Ultra-Enterprise-10000   all-A      2.0        5.8  2
B          Ultra-Enterprise-10000   all-A      3.0        5.8  3 4
C          Ultra-Enterprise-10000   all-A      2.0        5.7  5 6
D          Ultra-Enterprise-10000   all-A      3.0        5.9  7 

Showing Device Information

Before you attempt to perform any DR operation, use the showdevices (1M) command to display the device information, especially when removing devices.


procedure icon  To Show Device Information

1. Use the showdevices (1M) command to display the device information for a domain.

% showdevices -v -d A

The above command displays the device information for all of the CPUs in domain A. Refer to the showdevices (1M) man page to learn how to display device-specific information.

CPU
---
domain    board id      state    speed  ecache usage
A         SB10   40     online   400    4
A         SB10   41     online   400    4
A         SB10   42     online   400    4
A         SB10   43     online   400    4
A         SB14   56     online   400    4
A         SB14   57     online   400    4
A         SB14   58     online   400    4
A         SB14   59     online   400    4

The following output is an example of the memory output for the showdevices (1M) command.

Memory
drain in progress:
-----------------
                board   perm    base         domain  target  deleted  remaining
domain   board  memMB  memMB  address      memMB   board  memMB   memMB
A        SB10   2048   933    0x800000000  4096    SB14   512     1536
A        SB14   2048   0      0x400000000  4096

The following is an example of the I/O devices output for the showdevices (1M) command.

IO Devices
----------
domain    board  device     resource            usage
A         SB14   sd0
A         SB14   sd1
A         SB14   sd2
A         SB14   sd3        /dev/dsk/c0t3d0s0   mounted filesystem "/"
A         SB14   sd3        /dev/dsk/c0t3d0s1   dump device (swap)
A         SB14   sd3        /dev/dsk/c0t3d0s1   swap area
A         SB14   sd3        /dev/dsk/c0t3d0s3   mounted filesystem "/var"
A         SB14   sd3        /var/run            mounted filesystem "/var/run"
A         SB14   sd4
A         SB14   sd5

Refer to the showdevices (1M) man page for a complete list of the options and arguments for this command.

Adding Boards

Adding a board to a domain moves the board through several state changes. First the board is connected to the domain, and then it is configured into the Solaris operating environment. After the board is connected, it is considered to be part of the physical domain and available for use by the operating system.


procedure icon  To Add a Board to a Domain

1. Use the addboard (1M) command to add the board to the domain.

The following example shows how the addboard (1M) command adds system board 2 to the domain specified by domain_id . Two retries are performed, if necessary, with a wait time of 10 minutes (600 seconds) between retries.

% addboard -d domain_id -r 2 -t 600 SB2

Deleting Boards

Deleting a board from a domain removes the board from the domain.

Always check the usage of the components on a board before you delete it from a domain. If the board hosts permanent memory, the memory is moved to another board within the same domain before the board is deleted from the domain. Likewise, if any busy devices are present, you must wait or ensure that the device is no longer being used by the system before you attempt to remove the board.



caution icon

Caution Caution - You must use the power command to power off the board before physically removing it from the system. The deleteboard(1M) command does not power off the board. Refer to the power(1M) man page for information about the power command. Also see the section To Physically Replace a System Board. In addition, the Sun Enterprise 10000 Systems Service Manual contains complete information about physically removing and replacing boards.




procedure icon  To Delete a Board From a Domain

1. Use the deleteboard (1M) command to delete the board from the domain.

The following example of the deleteboard (1M) command deletes system board 2 from its current domain. Two retries are performed, if necessary, with a wait time of 15 minutes (900 seconds) between retries.

% deleteboard -r 2 -t 900 SB2

Moving Boards

Moving a board from one domain to another domain involves removing the board from the first domain; and then connecting and configuring it into the target domain.

Always check memory usage on a board, and the devices that are connected to it, before moving it out of a domain. If the board hosts permanent memory, the memory must be moved to another board within the same domain before the board can be moved to another domain. Likewise, if a busy device is present, you must wait until the device is no longer being used by the system before you attempt to move the board.


procedure icon  To Move a Board

1. Use the moveboard (1M) command to move the board from one domain to another domain.

The following example of the moveboard (1M) command moves system board 2 from its current domain to the domain specified by domain_id . Two retries are performed, if necessary, with a wait time of 15 minutes (900 seconds) between retries.

% moveboard -d domain_id -r 2 -t 900 SB2

Replacing System Boards

This section describes how to physically replace a board in a domain by using the commands described in this chapter.


procedure icon  To Physically Replace a System Board

In the following steps, system board 2 is removed from its current domain and replaced by system board 3. Two retries are performed, if necessary, with a wait time of 15 minutes (900 seconds) between retries.

1. Delete the board from the domain.

% deleteboard -r 2 -t 900 SB2

2. Power off system board 2.

Refer to the power (1M) man page for information about the power command.

% power -off -sb 2



caution icon

Caution Caution - For complete information about physically removing and replacing boards, refer to the Sun Enterprise 10000 Systems Service Manual. Failure to follow the procedures described therein can cause damage to system boards and other components.



3. Physically remove system board 2 and replace it with system board 3.

4. Power on system board 3.

% power -on -sb 3

5. Add system board 3 to the domain.

% addboard -r 2 -t 900 SB3