Repairing Transactional Volumes

Because a transactional volume is a "layered" volume, consisting of a master device and logging device, and because the logging device can be shared among file systems, repairing a failed transactional volume requires special recovery tasks.

Any device errors or panics must be managed by using the command line utilities.

Panics

If a file system detects any internal inconsistencies while it is in use, it will panic the system. If the file system is configured for logging, it notifies the transactional volume that it needs to be checked at reboot. The transactional volume transitions itself to the "Hard Error" state. All other transactional volumes that share the same log device also go into the "Hard Error" state.

At reboot, fsck checks and repairs the file system and transitions the file system back to the "Okay" state. fsck completes this process for all transactional volumes listed in the /etc/vfstab file for the affected log device.

Transactional Volume Errors

If a device error occurs on either the master device or the log device while the transactional volume is processing logged data, the device transitions from the "Okay" state to the "Hard Error" state. If the device is either in the "Hard Error" or "Error" state, either a device error has occurred, or a panic has occurred.

Any devices sharing the failed log device also go the "Error" state.

Recovering From Soft Partition Problems

The following sections show how to recover configuration information for soft partitions. You should only use these techniques if all of your state database replicas have been lost and you do not have a current or accurate copy of metastat -p output, the md.cf file, or an up-to-date md.tab file.

How to Recover Configuration Data for a Soft Partition

At the beginning of each soft partition extent, a sector is used to mark the beginning of the soft partition extent. These hidden sectors are called extent headers and do not appear to the user of the soft partition. If all Solaris Volume Manager configuration is lost, the disk can be scanned in an attempt to generate the configuration data.

This procedure is a last option to recover lost soft partition configuration information. The metarecover command should only be used when you have lost both your metadb and your md.cf files, and your md.tab is lost or out of date.

Note - This procedure only works to recover soft partition information, and does not assist in recovering from other lost configurations or for recovering configuration information for other Solaris Volume Manager volumes.

Note - If your configuration included other Solaris Volume Manager volumes that were built on top of soft partitions, you should recover the soft partitions before attempting to recover the other volumes.

Configuration information about your soft partitions is stored on your devices and in your state database. Since either of these sources could be corrupt, you must tell the metarecover command which source is reliable.

First, use the metarecover command to determine whether the two sources agree. If they do agree, the metarecover command cannot be used to make any changes. If the metarecover command reports an inconsistency, however, you must examine its output carefully to determine whether the disk or the state database is corrupt, then you should use the metarecover command to rebuild the configuration based on the appropriate source.

Read the "Background Information About Soft Partitions".

Review the soft partition recovery information by using the metarecover command.
metarecover component-p -d }
In this case, component is the c*t*d*s* name of the raw component. The -d option indicates to scan the physical slice for extent headers of soft partitions.

For more information, see the metarecover(1M) man page.

Example--Recovering Soft Partitions from On-Disk Extent Headers

# metarecover c1t1d0s1 -p -d
The following soft partitions were found and will be added to
your metadevice configuration.
 Name            Size     No. of Extents
    d10           10240         1
    d11           10240         1
    d12           10240         1
# metarecover c1t1d0s1 -p -d
The following soft partitions were found and will be added to
your metadevice configuration.
 Name            Size     No. of Extents
    d10           10240         1
    d11           10240         1
    d12           10240         1
WARNING: You are about to add one or more soft partition
metadevices to your metadevice configuration.  If there
appears to be an error in the soft partition(s) displayed
above, do NOT proceed with this recovery operation.
Are you sure you want to do this (yes/no)?yes
c1t1d0s1: Soft Partitions recovered from device.
bash-2.05# metastat
d10: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                        1                    10240

d11: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                    10242                    10240

d12: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                    20483                    10240

This example recovers three soft partitions from disk, after the state database replicas were accidentally deleted.

Recovering Configuration From a Different System

You can recover a Solaris Volume Manager configuration, even onto a different system from the original. For example, assume you have a system with an external Multipack of six disks in it, and a Solaris Volume Manager configuration, including at least one state database replica, on some of those disks. If you experience a system failure, you can attach the Multipack to a different system and recover the complete configuration from the local disk set.

Note - Only recover a Solaris Volume Manager configuration onto a system with no preexisting Solaris Volume Manager configuration. Otherwise, you risk replacing a logical volume on your system with a logical volume that you are recovering, and possibly corrupting your system.

Note - This process only works to recover volumes from the local disk set.

How to Recover a Configuration

Attach the disk or disks that contain the Solaris Volume Manager configuration to a system with no preexisting Solaris Volume Manager configuration.

Do a reconfiguration reboot to ensure that the system recognizes the newly added disks.
# reboot -- -r

Determine the major/minor number for a slice containing a state database replica on the newly added disks.
Use ls -lL, and note the two numbers between the group name and the date. Those are the major/minor numbers for this slice.
# ls -Ll /dev/dsk/c1t9d0s7 brw-r----- 1 root sys 32, 71 Dec 5 10:05 /dev/dsk/c1t9d0s7

If necessary, determine the major name corresponding with the major number by looking up the major number in /etc/name_to_major.
# grep " 32" /etc/name_to_major sd 32

Update the /kernel/drv/md.conf file with two commands: one command to tell Solaris Volume Manager where to find a valid state database replica on the new disks, and one command to tell it to trust the new replica and ignore any conflicting device ID information on the system.

In the line in this example that begins with mddb_bootlist1, replace the sd in the example with the major name you found in the previous step. Replace 71 in the example with the minor number you identified in Step 3.
#pragma ident "@(#)md.conf 2.1 00/07/07 SMI" # # Copyright (c) 1992-1999 by Sun Microsystems, Inc. # All rights reserved. # name="md" parent="pseudo" nmd=128 md_nsets=4; # #pragma ident "@(#)md.conf 2.1 00/07/07 SMI" # # Copyright (c) 1992-1999 by Sun Microsystems, Inc. # All rights reserved. # name="md" parent="pseudo" nmd=128 md_nsets=4; # Begin MDD database info (do not edit) mddb_bootlist1="sd:71:16:id0"; md_devid_destroy=1;# End MDD database info (do not edit)

Reboot to force Solaris Volume Manager to reload your configuration.

You will see messages similar to the following displayed to the console.
volume management starting. Dec 5 10:11:53 lexicon metadevadm: Disk movement detected Dec 5 10:11:53 lexicon metadevadm: Updating device names in Solaris Volume Manager The system is ready.

Verify your configuration by using the metadb and metastat commands.
# metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c1t9d0s7 a luo 16 8192 /dev/dsk/c1t10d0s7 a luo 16 8192 /dev/dsk/c1t11d0s7 a luo 16 8192 /dev/dsk/c1t12d0s7 a luo 16 8192 /dev/dsk/c1t13d0s7 # metastat d12: RAID State: Okay Interlace: 32 blocks Size: 125685 blocks Original device: Size: 128576 blocks Device Start Block Dbase State Reloc Hot Spare c1t11d0s3 330 No Okay Yes c1t12d0s3 330 No Okay Yes c1t13d0s3 330 No Okay Yes d20: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent Start Block Block count 0 3592 8192 d21: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent Start Block Block count 0 11785 8192 d22: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent Start Block Block count 0 19978 8192 d10: Mirror Submirror 0: d0 State: Okay Submirror 1: d1 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 82593 blocks d0: Submirror of d10 State: Okay Size: 118503 blocks Stripe 0: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t9d0s0 0 No Okay Yes c1t10d0s0 3591 No Okay Yes d1: Submirror of d10 State: Okay Size: 82593 blocks Stripe 0: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t9d0s1 0 No Okay Yes c1t10d0s1 0 No Okay Yes Device Relocation Information: Device Reloc Device ID c1t9d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ c1t10d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q c1t11d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ c1t12d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J c1t13d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0 # # metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c1t9d0s7 a luo 16 8192 /dev/dsk/c1t10d0s7 a luo 16 8192 /dev/dsk/c1t11d0s7 a luo 16 8192 /dev/dsk/c1t12d0s7 a luo 16 8192 /dev/dsk/c1t13d0s7 # metastat d12: RAID State: Okay Interlace: 32 blocks Size: 125685 blocks Original device: Size: 128576 blocks Device Start Block Dbase State Reloc Hot Spare c1t11d0s3 330 No Okay Yes c1t12d0s3 330 No Okay Yes c1t13d0s3 330 No Okay Yes d20: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent Start Block Block count 0 3592 8192 d21: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent Start Block Block count 0 11785 8192 d22: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent Start Block Block count 0 19978 8192 d10: Mirror Submirror 0: d0 State: Okay Submirror 1: d1 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 82593 blocks d0: Submirror of d10 State: Okay Size: 118503 blocks Stripe 0: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t9d0s0 0 No Okay Yes c1t10d0s0 3591 No Okay Yes d1: Submirror of d10 State: Okay Size: 82593 blocks Stripe 0: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t9d0s1 0 No Okay Yes c1t10d0s1 0 No Okay Yes Device Relocation Information: Device Reloc Device ID c1t9d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ1 c1t10d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q c1t11d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ c1t12d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J c1t13d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0 # metastat -p d12 -r c1t11d0s3 c1t12d0s3 c1t13d0s3 -k -i 32b d20 -p d10 -o 3592 -b 8192 d21 -p d10 -o 11785 -b 8192 d22 -p d10 -o 19978 -b 8192 d10 -m d0 d1 1 d0 1 2 c1t9d0s0 c1t10d0s0 -i 32b d1 1 2 c1t9d0s1 c1t10d0s1 -i 32b #


24. Troubleshooting Solaris Volume Manager Recovering From State Database Replica Failures How to Recover From Insufficient State Database Replicas Example--Recovering From Stale State Database Replicas