C H A P T E R  2

Disk Drive Hot-Plug Procedures

The Sun Enterprise 250 server supports "hot-plugging" of internal disk drives. This hot-plug feature enables you to install a new disk drive, or remove and replace a failed disk drive, without shutting down the operating system or powering off the system. The hot-plug procedure involves software commands for preparing the system prior to removing a disk drive and for reconfiguring the operating environment after installing a new drive.



caution icon

Caution - Drives should not be pulled out randomly. If the drive is active, you must stop that activity before removing the drive. This can be done without bringing down the operating system or powering down the system. The system supports hot-plugging, but there are software considerations that must be taken into account. Follow the procedures in this document when removing, replacing, and adding drives.




Overview

Hot-plug reconfiguration or hot-plug operations cannot be performed on an active disk drive. All disk access activity must be stopped on a disk drive being removed or replaced.

In general, hot-plug reconfiguration operations involve three separate stages:

  1. Preparing for hot-plug reconfiguration

  2. Adding, replacing, or removing a disk drive

  3. Reconfiguring the operating environment.

Three specific cases exist where the hot-plug feature is useful.


Adding a Hot-Pluggable Disk Drive

This section contains information on how to configure your system when you add a disk drive while the power is on and the operating system is running.

The way in which you add a disk drive depends on the application you are using. Each application requires that you decide where to install the new disk drive, add the drive, and then reconfigure the operating environment.

In all cases, you must select a slot, physically install the disk drive, and configure the Solaris environment to recognize the drive. Then you must configure your application to accept the new disk drive.

1. Select a slot for the new disk drive.

The Sun Enterprise 250 server's internal disk array can accommodate up to six UltraSCSI disk drives. The figure below shows the system's six internal disk slots. Disk slots are numbered from 0 to 5. Select any available slot for the new disk drive.

FIGURE 2-1 Slot Numbers for Internal Disk Array

2. Insert the new disk drive into the selected slot.

Refer to the Sun Enterprise 250 Server Owner's Guide for drive installation instructions.

3. Use the drvconfig command to create a new device entry for the drive in the /devices hierarchy:

# drvconfig

4. Determine the raw physical device name for the slot that you selected.

Consult the following table.

TABLE 2-1 Slot Physical Device Names

Disk Slot Number

Raw Physical Device Name

0

/devices/pci@1f,4000/scsi@3/sd@0,0:c,raw

1

/devices/pci@1f,4000/scsi@3/sd@8,0:c,raw

2

/devices/pci@1f,4000/scsi@3/sd@9,0:c,raw

3

/devices/pci@1f,4000/scsi@3/sd@a,0:c,raw

4

/devices/pci@1f,4000/scsi@3/sd@b,0:c,raw

5

/devices/pci@1f,4000/scsi@3/sd@c,0:c,raw


5. Use the ssaadm insert_device command to add the new device:

# ssaadm insert_device physical_device_name
ssaadm: warning: can't quiesce "/devices/pci@1f,4000/scsi@3/sd@b,0:c,raw": I/O error
Bus is ready for the insertion of device(s)
Insert device(s) and reconfigure bus as needed
Press RETURN when ready to continue

Here, physical_device_name is the full physical device name determined in Step 4.

You can safely ignore the warning message since the Sun Enterprise 250 SCSI bus does not require quiescing.

6. Press Return to complete the hot-plug operation.

The ssaadm command creates a new device entry for the drive in the /dev/dsk , and /dev/rdsk hierarchies. The new drive is assigned a logical device name of the form c w t x d y s z , where:

w corresponds to the SCSI controller for the disk drive
x corresponds to the SCSI target for the disk drive
y is the logical unit number for the disk drive (always 0)
z is the slice (or partition) on the disk

The logical device name assigned to the drive depends on the disk slot number where the drive is installed.

7. To verify that the new disk has been created, type:

# ls -lt /dev/dsk | more
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s0 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:a
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s1 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:b
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s2 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:c
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s3 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:d
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s4 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:e
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s5 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:f
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s6 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:g
lrwxrwxrwx   1 root     root          41 Jan 30 09:07 c0t11d0s7 -> ../../devices/pci@1f,4000/scsi@3/sd@b,0:h
--More--(13%)

The new disk and its logical device name appear at the top of the list. Check the file creation date to make sure it matches the current time and date. In the example above, the logical device name for the new disk is c0t11d0 .

Configuring the New Disk Drive Within Your Application

Configure the new disk drive by following the instructions for your specific application:



caution icon

Caution Caution - These procedures should be performed only by a qualified system administrator. Performing hot-plug operations on an active disk drive may result in data loss if performed incorrectly.



Configuring the New Disk Drive for a UNIX File System (UFS)

Use the following procedure to configure a slice (single physical partition) on a disk to be used with a UFS file system. For instructions about adding a file system to a Solstice trademark DiskSuite trademark (SDS) logical disk, refer to the documentation that came with your application.

1. Verify that the device label meets your requirements.

You can use the prtvtoc command to inspect the label for your disk. To modify the label, use the format command. Refer to the prtvtoc(1M) and format(1M) man pages for more information.

2. Select a disk slice for your UFS file system and create a new file system on the slice:

# newfs /dev/rdsk/cwtxdysz

For example: newfs /dev/rdsk/c0t11d0s2

Refer to the newfs(1M) man page for more information.

3. If necessary, create a mount point for the new file system:

# mkdir mount_point

where mount_point is a fully qualified path name. Refer to the mount(1M) man page for more information.

4. After the file system and mount point have been created, modify the /etc/vfstab file to reflect the new file system.

See the vfstab(4) man page for more details.

5. Mount the new file system using the mount command:

# mount mount_point

where mount_point is the directory you created.

The file system is ready to be used.

Adding a Disk to a Solstice DiskSuite Disk Set

You can use any disk you add to the system for Solstice DiskSuite (SDS) new or existing metadevices.

Refer to the Solstice DiskSuite documentation for information on configuring the disk drive.


Replacing a Faulty Hot-Pluggable Disk Drive

This section contains information on configuring your system to replace a disk drive while the power is on and the operating system is running.

The way in which you replace a faulty disk drive depends on the application you are using. Each application is different, but requires that you:

  1. Determine which disk drive is failing or has failed

  2. Remove the disk

  3. Add the replacement drive

  4. Reconfigure the operating environment.

In all cases you must stop any activity or applications on the disk; unmount it; physically remove the old drive and install the new one; and configure the Solaris environment to recognize the drive. Then you must configure your application to accept the new disk drive.

Prepare Spare Drives

If possible, prepare replacement disk drives in advance. Each replacement disk drive should be formatted, labeled, and partitioned the same as the disk it will replace. See the documentation for your application for instructions on how to format and partition the disk, and add that disk to your application.

Identifying the Faulty Disk Drive

Disk errors may be reported in a number of different ways. Often you can find messages about failing or failed disks in your system console. This information is also logged in the /usr/adm/messages file(s). These error messages typically refer to a failed disk drive by its physical device name (such as /devices/pci@1f,4000/scsi@3/sd@b,0 ) and its UNIX device instance name (such as sd11 ). In some cases, a faulty disk may be identified by its logical device name (such as c0t11d0 ). In addition, some applications may report a disk slot number (0 through 5) or activate an LED located next to the disk drive itself (see following figure).

FIGURE 2-2 Disk Slot Numbers and LED Locations

In order to perform a disk hot-plug procedure, you need to know the slot number of the faulty disk (0 through 5) and its logical device name (for example, c0t11d0 ). If you know the disk slot number, it is possible to determine the logical device name, and vice versa. It is also possible to determine both the disk slot number and the logical device name from a physical device name (such as /devices/pci@1f,4000/scsi@3/sd@b,0 ).

To make the necessary translation from one form of disk identifier to another, see Chapter 3 . Once you have determined both the disk slot number and logical device name, you are ready to continue with this procedure.

Replacing the Disk Drive Within Your Application

Continue the disk replacement by following the instructions for your specific application.

UNIX File System (UFS)

The following procedure describes how to deconfigure a disk being used by one or more UFS file systems.



caution icon

Caution Caution - These procedures should be performed only by a qualified system administrator. Performing hot-plug operations on an active disk drive can result in data loss if performed incorrectly.



1. Type su and your superuser password.

2. Identify activities or applications attached to the device you plan to remove.

Commands to use are mount , showmount -a , and ps -ef . See the mount(1M) , showmount(1M) , and ps(1) man pages for more details.

For example, where the controller number is 0 and the target ID is 11 :

# mount | grep c0t11
/export/home1 on /dev/dsk/c0t11d0s2 setuid/read/write on
# showmount -a | grep /export/home1
cinnamon:/export/home1/archive
austin:/export/home1
swlab1:/export/home1/doc
# ps -f | grep c0t11
root  1225   450   4 13:09:58  pts/2   0:00 grep c0t11

In this example, the file system /export/home1 on the faulty disk is being remotely mounted by three different systems-- cinnamon , austin , and swlab1 . The only process running is grep , which has finished.

3. Stop any activity or application processes on the file systems to be deconfigured.

4. Back up your system.

5. Determine what file system(s) are on the disk:

# mount | grep cwtx

For example, if the device to be removed is c0t11d0 , enter the following:

#mount | grep c0t11
/export/home   (/dev/dsk/c0t11d0s7 ):   98892 blocks   142713 files
/export/home1  (/dev/dsk/c0t11d0s5 ):  153424 blocks   112107 files

6. Determine and save the partition table for the disk.

If the replacement disk is the same type as the faulty disk, you can use the format command to save the partition table of the disk. Use the save command in format to save a copy of the partition table to the /etc/format.dat file. This will allow you to configure the replacement disk so that its layout matches the current disk.

Refer to the format(1M) man page for more information.

7. Unmount any file systems on the disk.

For each file system returned, type:

# umount file_system

where file_system is the first field for each line returned in Step 5.

For example:

#umount /export/home
#umount /export/home1



Note Note - If the file system(s) are on a disk that is failing or has failed, the umount operation may not complete. A large number of error messages may be displayed in the system console and in the /var directory during the umount operation. If the umount operation does not complete, you may have to restart the system.



8. Use the ssaadm replace_device command to take the device offline:

# ssaadm replace_device logical_device_name
ssaadm: warning: can't quiesce "/dev/rdsk/c0t11d0s2": I/O error
Bus is ready for the replacement of device
Replace device and reconfigure bus as needed
Press RETURN when ready to continue

Here, logical_device_name is the full logical device name of the drive to be removed ( /dev/rdsk/c0t11d0s2 ). You must specify slice 2, which represents the entire disk. Note that this command also accepts a physical device name as an alternative.

You can safely ignore the warning message since the Sun Enterprise 250 SCSI bus does not require quiescing.

9. Remove the faulty disk drive and insert the replacement drive in its place.

Refer to the Sun Sun Enterprise 250 Server Owner's Guide for drive removal and replacement instructions.

10. Press Return to complete the hot-plug operation.

The ssaadm command brings the replacement drive back online.

11. Verify that the device's partition table satisfies the requirements of the file system(s) you intend to re-create.

You can use the prtvtoc command to inspect the label for your device. If you need to modify the label, use the format command. Refer to the prtvtoc(1M) and format(1M) man pages for more information.
For example:

# prtvtoc /dev/rdsk/cwtxdysz

If you have saved a disk partition table using the format utility and the replacement disk type matches the old disk type, then you can use the format utility's partition section to configure the partition table of the replacement disk. See the select and label commands in the partition section.

If the replacement disk is of a different type than the disk it replaced, you can use the partition size information from the previous disk to set the partition table for the replacement disk. Refer to the prtvtoc(1M) and format(1M) man pages for more information.

You have defined your disk's partition table and have identified the disk slice on which to build your UFS file system.

12. Once you have selected a disk slice for your UFS file system, check and/or create a file system on the slice:

# fsck /dev/rdsk/cwtxdysz
# newfs /dev/rdsk/cwtxdysz

13. Mount the new file system using the mount command:

# mount mount_point

where mount_point is the directory on which the faulty disk was mounted.

The new disk is ready to be used. You can now restore data from your backups.

Solstice DiskSuite

The following procedure describes how to replace a disk in use by Solstice DiskSuite. Refer to the Solstice DiskSuite documentation for more information.



caution icon

Caution Caution - These procedures should be performed only by a qualified system administrator. Performing hot-plug operations on an active disk drive can result in data loss if performed incorrectly.



1. Back up your system.

2. Type su and your superuser password.

3. If possible, save the partition table for the disk you intend to replace.

If the disk label can still be read, save the disk partitioning at this time.



Note Note - Save all the disk partitioning information immediately after configuring metadevices or file systems for use when recovering from device failure later.



Use the prtvtoc command to save the partition information.

# prtvtoc /dev/rdsk/cwtxdys0 > saved_partition_table_file

For example:

# prtvtoc /dev/rdsk/c0t11d0s0 > /etc/c0t11d0s0.vtoc

4. Identify metadevices or applications using the device you plan to remove.

For example:

# metadb | grep c0t11d0
# metastat | grep c0t11d0
# mount | grep c0t11d0

Save the output of the commands to reconstruct the metadevice configuration after you replace the disk.

5. Delete database replicas.

If there are database replicas on the disk, these must be deleted. First record the size and number of database replicas on each slice; then delete them.

# metadb -d cwtxdysz

For example:

# metadb -d c0t11d0s0

6. Detach submirrors.

If any slices of the disk are used by submirrors, the submirrors should be detached. For example:

# metadetach d20 d21

7. Delete hotspares.

If any slices are used by hotspare pools, remove them. Record the hotspare pools containing the slices; then delete them. For example:

# metahs -d all c0t11d0s1

8. Terminate all other metadevice activity on the disk.

Check metastat output for other slices of the disk used by metadevices that cannot be detached (stripes not in mirrors, etc.). These metadevices must be unmounted if they contain file systems, or they must otherwise be taken off line.

Refer to the prtvtoc(1M) man page for more information.

9. Unmount any file systems on the disk.


Note Note - If the file system(s) are on a disk that is failing or has failed, the umount operation may not complete. A large number of error messages may be displayed in the system console and in the /var directory during the umount operation. If the umount operation does not complete, you may have to restart the system.



For each file system returned, type:

# umount file_system

where file_system is the first field for each line returned in Step 4.

For example:

#umount /export/home
#umount /export/home1

10. Use the ssaadm replace_device command to take the device offline:

# ssaadm replace_device logical_device_name
ssaadm: warning: can't quiesce "/dev/rdsk/c0t11d0s2": I/O error
Bus is ready for the replacement of device
Replace device and reconfigure bus as needed
Press RETURN when ready to continue

Here, logical_device_name is the full logical device name of the drive to be removed ( /dev/rdsk/c0t11d0s2 ). You must specify slice 2, which represents the entire disk. Note that this command also accepts a physical device name as an alternative.

You can safely ignore the warning message since the Sun Enterprise 250 SCSI bus does not require quiescing.

11. Remove the faulty disk drive and insert the replacement drive in its place.

Refer to the Sun Enterprise 250 Server Owner's Guide for drive removal and replacement instructions.

12. Press Return to complete the hot-plug operation.

The ssaadm command brings the replacement drive back online.

13. Restore the disk partitioning.

If you have saved the disk partitioning to a file, you may restore it with fmthard . For example:

# fmthard -s /etc/c0t11d0s0.vtoc  /dev/rdsk/c0t11d0s0

If you have not saved the disk partitioning, use the format ( 1M ) or fmthard ( 1M ) command to repartition the disk.

14. Replace any database replicas.

For example:

# metadb -a -l 2000 -c 2 c0t11d0s0

15. Reattach any submirrors.

For example:

# metattach d20 d21

16. Re-create hot spares for each hot spare pool that contained a slice on the new disk.

For example:

# metahs -a hsp001 c0t11d0s1

17. Fix any broken metadevices, using slices from the new disk.

If the disk to be replaced had caused any metadevices to go into the maintenance state, these metadevices can be repaired by re-enabling the slices.

# metareplace -e mirror_or_RAID5_metadeice cwtxdysz

18. Remount any file systems and restart any applications that were using metadevices that could not be taken off line.

# mount file_system

Refer to the Solstice DiskSuite documentation for more information.


Removing a Hot-Pluggable Disk Drive

This section contains information on how to configure your system to remove a disk drive while the power is on and the operating system is running. Use the procedures in this chapter if you do not intend to replace the disk drive.

The way in which you remove a disk drive depends on the application you are using. Each application is different, but requires that you:

  1. Select the disk drive

  2. Remove the disk

  3. Reconfigure the operating environment.

In all cases you must select the disk and stop any activity or applications on it, unmount it, physically remove the drive, and configure the Solaris environment to recognize that the drive is no longer there. Then you must configure your application to operate without this device in place.

Identifying the Faulty Disk Drive

Disk errors may be reported in a number of different ways. Often you can find messages about failing or failed disks in your system console. This information is also logged in the /usr/adm/messages file(s). These error messages typically refer to a failed disk drive by its UNIX physical device name (such as /devices/pci@1f,4000/scsi@3/sd@b,0 ) and its UNIX device instance name (such as sd11 ). In some cases, a faulty disk may be identified by its UNIX logical device name, such as c0t11d0 . In addition, some applications may report a disk slot number (0 through 5) or activate an LED located next to the disk drive itself (see the following figure ).

FIGURE 2-3 Disk Slot Numbers and LED Locations

In order to perform a disk hot-plug procedure, you need to know the slot number of the faulty disk (0 through 5) and its logical device name (for example, c0t11d0 ). If you know the disk slot number, it is possible to determine the logical device name, and vice versa. It is also possible to determine both the disk slot number and the xlogical device name from a physical device name (such as /devices/pci@1f,4000/scsi@3/sd@b,0 ).

To make the necessary translation from one form of disk identifier to another, see Chapter 3 . Once you have determined both the disk slot number and logical device name, you are ready to continue with this procedure.

Removing a Disk Drive From Your Application

Continue the hot disk removal by following the instructions for your specific application:

UNIX File System (UFS)

The following procedure describes how to remove a disk being used by one or more UFS file systems.

1. Type su and your superuser password.

2. Identify activities or applications attached to the device you plan to remove.

Commands to use are mount , showmount -a , and ps -ef . See the mount(1M) , showmount(1M) , and ps(1) man pages for more details.

For example, where the controller number is 0 and the target ID is 11 :

# mount | grep c0t11
/export/home1 on /dev/dsk/c0t11d0s2 setuid/read/write on
# showmount -a | grep /export/home1
cinnamon:/export/home1/archive
austin:/export/home1
swlab1:/export/home1/doc
# ps -f | grep c0t11
root  1225   450   4 13:09:58  pts/2   0:00 grep c0t11

In this example, the file system /export/home1 on the faulty disk is being remotely mounted by three different systems-- cinnamon , austin , and swlab1 . The only process running is grep , which has finished.

3. Stop any activity or application processes on the file systems to be deconfigured.

4. Back up your system.

5. Determine what file system(s) are on the disk:

# mount | grep cwtx

6. Unmount any file systems on the disk.


Note Note - If the file system(s) are on a disk that is failing or has failed, the umount operation may not complete. A large number of error messages may be displayed in the system console and in the /var directory during the umount operation. If the umount operation does not complete, you may have to restart the system.



For each file system returned, type:

# umount file_system


where file_system is the first field for each line returned in Step 5.

For example:

#umount /export/home
#umount /export/home1

7. Use the ssaadm remove_device command to take the device offline:

# ssaadm remove_device logical_device_name
ssaadm: warning: can't quiesce "/dev/rdsk/c0t11d0s2": I/O error
Bus is ready for the removal of device
Remove device and reconfigure bus as needed
Press RETURN when ready to continue

Here, logical_device_name is the full logical device name for the drive to be removed ( /dev/rdsk/c0t11d0s2 , for example). You must specify slice 2, which represents the entire disk. Note that this command also accepts a physical device name as an alternative.

You can safely ignore the warning message since the Sun Enterprise 250 SCSI bus does not require quiescing.

8. Remove the disk drive from its slot.

Refer to the Sun Sun Enterprise 250 Server Owner's Guide for drive removal instructions.

9. Press Return to complete the hot-plug operation.

The ssaadm command deletes the symbolic links for the device in the /dev/dsk , and /dev/rdsk hierarchies.

Solstice DiskSuite

The following procedure describes how to deconfigure a disk in use by Solstice DiskSuite software. For more information, refer to the Solstice DiskSuite documentation.

1. Back up your system.

2. Type su and your superuser password.

3. Identify metadevices or applications using the device you plan to remove.

For example:

# metadb | grep c0t11d0
# metastat | grep c0t11d0
# mount | grep c0t11d0

4. Delete database replicas.

If there are database replicas on the disk, these must be deleted. For example:

# metadb -d c0t11d0s0

5. Replace slices or clear metadevices.

If any slices of the disk are in use by submirrors or within RAID metadevices, they can be replaced by other available slices. For example:

# metareplace d20 c0t11d0s1 c0t8d0s1

If there are no replacement slices available, the metadevices must be cleared. For example:

# metaclear d21

6. Replace slices or clear hotspares.

If any slices of the disk are used by hotspare pools, they can be replaced by other available slices.

# metahs -r all c0t11d0s1 c0t8d0s1

For example:

7. Unmount any file systems on the disk.


Note Note - If the file system(s) are on a disk that is failing or has failed, the umount operation may not complete. A large number of error messages may be displayed in the system console and in the /var directory during the umount operation. If the umount operation does not complete, you may have to restart the system.



For each file system, type:

# umount file_system

For example:

# umount /export/home
# umount /export/home1

Refer to the Solstice DiskSuite documentation for more information.

8. Use the ssaadm remove_device command to take the device offline:

# ssaadm remove_device logical_device_name
ssaadm: warning: can't quiesce "/dev/rdsk/c0t11d0s2": I/O error
Bus is ready for the removal of device
Remove device and reconfigure bus as needed
Press RETURN when ready to continue

Here, logical_device_name is the full logical device name for the drive to be removed ( /dev/rdsk/c0t11d0s2 , for example). You must specify slice 2, which represents the entire disk. Note that this command also accepts a physical device name as an alternative.

You can safely ignore the warning message since the Sun Enterprise 250 SCSI bus does not require quiescing.

9. Remove the disk drive from its slot.

Refer to the Sun Sun Enterprise 250 Server Owner's Guide for drive removal instructions.

10. Press Return to complete the hot-plug operation.

The ssaadm command deletes the symbolic links for the device in the /dev/dsk , and /dev/rdsk hierarchies.