SRDB ID   Synopsis   Date
17623   Alternative procedure to set up and fail over to a spare SSP.   27 Aug 1998

Status Issued

Description
The current supported procedure for setting up, and failing over to,
a spare SSP is described in the "E10000 System Hardware Installation
and De-installation Guide"

Many sites use this procedure without problems, however there are
a number of customers who have great difficulty trying to follow 
it.

Problems are:-

* The concept is not intuitive to understand. Customers are
  asking for something simpler.
  
* The procedure requires that the customer makes changes on the
  domains when an SSP is failed over.  
   
* There is a risk of package corruption, because only certain
  directories are copied across.  Most patches update files in
  /opt/SUNWssp, but there are some which update files in 
  /var/opt/SUNWssp.  As a result, there is a chance that pkginfo
  files may become inconsistant with actual file structure when
  patches are applied. To avoid this problem customers would have 
  to patch the main and the spare SSP at the same time. I think 
  that this is asking too much of many customers.
SOLUTION SUMMARY:
This article describes an alternative procedure which has the
following advantages:-

* Simpler concept.

* Lower risk of package inconsistancy.

* SSP failover is transparent to the domains. No config changes
  are required.
  
The "E10000 System Hardware Installation and De-installation Guide"
is currently under review by HES-CTE and the SSP failover procedure 
will be changed in the next release. (Jan 1999) It is expected that
the new procedure will be based on the procedures described in this
article.

Concept
--------
The complete SSP root file system is backed up to tape using ufsdump. 
The tape is restored onto a dedicated spare SSP to produce a 
replica SSP.
Once this is done, failover can be performed by shutting down 
the active SSP and booting up the replica in its place. The 
replica will seamlessly take over the support of the platform.

This procedure will only work for a dedicated spare SSP with a
hardware configuration identical to the main SSP. The SSP disk
must be formatted in an identical slice layout.

I would recommend a low cost local tape drive for the SSP as
a worthwhile investment, however this procedure will work with
a tape drive on a remote host.

Procedure
---------
* When installing a Starfire, do not install the spare SSP until 
  near the end of the installation when the final config of the 
  platform and the SSP are settled. Cable up the hardware as
  described in the installation documentation.

* BACKUP THE SSP

	* Shutdown the SSP to single user mode.
	
	* Backup the SSP disk....
	
	 	ufsdump 0f /dev/rmt/XX /
	 	
	* If the tape drive is on a host "tapehost"
	
		On tapehost enter the SSP hostname
		in /.rhosts
		
		ufsdump 0f tapehost:/dev/rmt/XX /

	        If you have shutdown from multi-user to
	        single user you should not have to configure
	        the network, however refer to SRDB 13634 if
	        you need assistance.
	        
	 * When the backup is complete reboot the SSP to
	   multi-user mode.
	   
	         
* CREATE THE REPLICA SSP

	* If the tape drive is local......
	
		* boot cdrom -s (from a Solaris 2.X CD)
		
		* newfs /dev/rdsk/c0t3t0s0
		
		* fsck /dev/rdsk/c0t3d0s0
		
		* mount /dev/dsk/c0t3d0s0 /a
	
		* cd /a
		
		* ufsrestore rf /dev/rmt/XX
		
		* rm restoresymtable
		
		* If the disk is new and has never been booted
		  then install a boot block (Should not be
		  required on a factory shipped SSP)
		  	
	 	* cd /usr/platform/sun4m/lib/fs/ufs
	 		
	 	* /usr/sbin/installboot  ./bootblk /dev/rdsk/c0t3t0s0

	 		
	 
	 * If the tape drive is remote......
	 
	 	* Use SRDB 13634 to configure the network
	 	  NOTE: For the purpose of restoring assign a 
	 	  hostname and IP address to the spare SSP that 
	 	  is different from the main SSP.
	 	        	
	 	        	
	 * When complete, shutdown the spare SSP ready for failover.
	
	
* FAILOVER PROCEDURE

	* Shut down the acting SSP.
	  (Fans on starfire will go into high speed)
	
	* Boot up the replica SSP.
	
	* Wait for the fans to go to normal speed.
	  The replica is now the acting SSP. No action
	  is required on the domains. 
	
	* NOTE: It is important that the 2 SSPs are never booted
	  up together while connected to the network, as they
	  have identical hostnames and IP addresses.  When not
	  in use keep the replica powered off or disconected
	  from the networks, to avoid accidental booting.	 

* MAINTAINING CONSISTANCY BETWEEN SSPs

         * Repeat the whole backup/restore procedure on a regular 
           schedule, or after any major change to the SSP software.
APPLIES TO: Hardware/Ultra Enterprise/Servers/Enterprise 10000, Operating Systems/Solaris/Solaris 2.5.1 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.