Handling State Database Replica Errors

How does Solaris Volume Manager handle failed replicas?

The system will continue to run with at least half of the available replicas. The system will panic when fewer than half of the replicas are available.

The system can reboot multiuser when at least one more than half of the replicas are available. If fewer than a majority of replicas are available, you must reboot into single-user mode and delete the unavailable replicas (by using the metadb command).

For example, assume you have four replicas. The system will stay running as long as two replicas (half the total number) are available. However, to reboot the system, three replicas (half the total plus one) must be available.

In a two-disk configuration, you should always create at least two replicas on each disk. For example, assume you have a configuration with two disks, and you only create three replicas (two replicas on the first disk and one replica on the second disk). If the disk with two replicas fails, the system will panic because the remaining disk only has one replica and this is less than half the total number of replicas.

Note - If you create two replicas on each disk in a two-disk configuration, Solaris Volume Manager will still function if one disk fails. But because you must have one more than half of the total replicas available for the system to reboot, you will be unable to reboot.

What happens if a slice that contains a state database replica fails?

The rest of your configuration should remain in operation. Solaris Volume Manager finds a valid state database during boot (as long as there are at least half plus one valid state database replicas).

What happens when state database replicas are repaired?

When you manually repair or enable state database replicas, Solaris Volume Manager updates them with valid data.

Scenario--State Database Replicas

State database replicas provide redundant data about the overall Solaris Volume Manager configuration. The following example, drawing on the sample system provided in Chapter 4, Configuring and Using Solaris Volume Manager (Scenario), describes how state database replicas can be distributed to provide adequate redundancy.

The sample system has one internal IDE controller and drive, plus two SCSI controllers, which each have six disks attached. With three controllers, the system can be configured to avoid any single point-of-failure. Any system with only two controllers cannot avoid a single point-of-failure relative to Solaris Volume Manager. By distributing replicas evenly across all three controllers and across at least one disk on each controller (across two disks if possible), the system can withstand any single hardware failure.

A minimal configuration could put a single state database replica on slice 7 of the root disk, then an additional replica on slice 7 of one disk on each of the other two controllers. To help protect against the admittedly remote possibility of media failure, using two replicas on the root disk and then two replicas on two different disks on each controller, for a total of six replicas, provides more than adequate security.

To round out the total, add 2 additional replicas for each of the 6 mirrors, on different disks than the mirrors. This configuration results in a total of 18 replicas with 2 on the root disk and 8 on each of the SCSI controllers, distributed across the disks on each controller.


5. State Database (Overview) Background Information for Defining State Database Replicas Guidelines for State Database Replicas