SRDB ID |
|
Synopsis |
|
Date |
17623 |
|
Alternative procedure to set up and fail over to a spare SSP. |
|
27 Aug 1998 |
The current supported procedure for setting up, and failing over to,
a spare SSP is described in the "E10000 System Hardware Installation
and De-installation Guide"
Many sites use this procedure without problems, however there are
a number of customers who have great difficulty trying to follow
it.
Problems are:-
* The concept is not intuitive to understand. Customers are
asking for something simpler.
* The procedure requires that the customer makes changes on the
domains when an SSP is failed over.
* There is a risk of package corruption, because only certain
directories are copied across. Most patches update files in
/opt/SUNWssp, but there are some which update files in
/var/opt/SUNWssp. As a result, there is a chance that pkginfo
files may become inconsistant with actual file structure when
patches are applied. To avoid this problem customers would have
to patch the main and the spare SSP at the same time. I think
that this is asking too much of many customers.
SOLUTION SUMMARY:
This article describes an alternative procedure which has the
following advantages:-
* Simpler concept.
* Lower risk of package inconsistancy.
* SSP failover is transparent to the domains. No config changes
are required.
The "E10000 System Hardware Installation and De-installation Guide"
is currently under review by HES-CTE and the SSP failover procedure
will be changed in the next release. (Jan 1999) It is expected that
the new procedure will be based on the procedures described in this
article.
Concept
--------
The complete SSP root file system is backed up to tape using ufsdump.
The tape is restored onto a dedicated spare SSP to produce a
replica SSP.
Once this is done, failover can be performed by shutting down
the active SSP and booting up the replica in its place. The
replica will seamlessly take over the support of the platform.
This procedure will only work for a dedicated spare SSP with a
hardware configuration identical to the main SSP. The SSP disk
must be formatted in an identical slice layout.
I would recommend a low cost local tape drive for the SSP as
a worthwhile investment, however this procedure will work with
a tape drive on a remote host.
Procedure
---------
* When installing a Starfire, do not install the spare SSP until
near the end of the installation when the final config of the
platform and the SSP are settled. Cable up the hardware as
described in the installation documentation.
* BACKUP THE SSP
* Shutdown the SSP to single user mode.
* Backup the SSP disk....
ufsdump 0f /dev/rmt/XX /
* If the tape drive is on a host "tapehost"
On tapehost enter the SSP hostname
in /.rhosts
ufsdump 0f tapehost:/dev/rmt/XX /
If you have shutdown from multi-user to
single user you should not have to configure
the network, however refer to SRDB 13634 if
you need assistance.
* When the backup is complete reboot the SSP to
multi-user mode.
* CREATE THE REPLICA SSP
* If the tape drive is local......
* boot cdrom -s (from a Solaris 2.X CD)
* newfs /dev/rdsk/c0t3t0s0
* fsck /dev/rdsk/c0t3d0s0
* mount /dev/dsk/c0t3d0s0 /a
* cd /a
* ufsrestore rf /dev/rmt/XX
* rm restoresymtable
* If the disk is new and has never been booted
then install a boot block (Should not be
required on a factory shipped SSP)
* cd /usr/platform/sun4m/lib/fs/ufs
* /usr/sbin/installboot ./bootblk /dev/rdsk/c0t3t0s0
* If the tape drive is remote......
* Use SRDB 13634 to configure the network
NOTE: For the purpose of restoring assign a
hostname and IP address to the spare SSP that
is different from the main SSP.
* When complete, shutdown the spare SSP ready for failover.
* FAILOVER PROCEDURE
* Shut down the acting SSP.
(Fans on starfire will go into high speed)
* Boot up the replica SSP.
* Wait for the fans to go to normal speed.
The replica is now the acting SSP. No action
is required on the domains.
* NOTE: It is important that the 2 SSPs are never booted
up together while connected to the network, as they
have identical hostnames and IP addresses. When not
in use keep the replica powered off or disconected
from the networks, to avoid accidental booting.
* MAINTAINING CONSISTANCY BETWEEN SSPs
* Repeat the whole backup/restore procedure on a regular
schedule, or after any major change to the SSP software.
APPLIES TO: Hardware/Ultra Enterprise/Servers/Enterprise 10000, Operating Systems/Solaris/Solaris 2.5.1
ATTACHMENTS:
Copyright (c) 1997-2003 Sun Microsystems, Inc.