SRDB ID   Synopsis   Date
44350   Sun Enterprise[TM] 10000: Reinstalling SSP software on SSP 3.4 and/or 3.5   28 May 2002

Status Issued

Description

Situations can arise where the configuration of both the main and spare SSP are in such a state that the SSPs no longer function in their role of administrating their E10000 platform properly. In the case that troubleshooting methods are exhausted with no resolution of issues, the idea of re-installing the SSPs becomes a more attractive method of resolving the problems. This process, however, doesn't have to be as painful as it sounds.

Sometimes the problems with the SSPs are the result of improper installation. Sometimes an issue may arise such as an improperly installed patch that completely breaks the SSP's function as platform administrator. And a third reason may just be as simple as the implemented security hardening, which no longer allows the SSP's communication with the platform.

Assuming all standard troubleshooting resources have been exhausted, it is probably a good time to think about either an OS and SSP reload, or just a SSP reload.

SOLUTION SUMMARY:

So, how do you completely re-install the SSP configuration? You can follow these following steps to help with the reload, but this is just a guideline. Each case will be different and each SSP configuration will have various degrees of problems, so work with what you have, and use this as a reference.


1) Collect explorer data from both SSPs as they are configured right now (broken). This explorer information will collect some data which can be easily used during the reinstall to speed up the process. Of course, depending on specifically what issues the broken configuration is having, the data must only be used if it is known to be good data.

2) Bring down both of the SSPs, but install only one of them for the time being. (It is assumed that neither SSP is actually administering the platform at this time, so leaving one or the other up at this time is a moot point.) Choose to start this procedure on the one SSP which is to be the main at the end of the procedure.

3) Install a fresh OS onto the first SSP. (NOTE: only required if you believe that the OS is at the root of the SSP problems. If the OS isn't suspect, but only the SSP software is, proceed to step 4 after first removing the SSP patches and packages.) Next, copy the /etc/host, /etc/nsswitch.conf, and /etc/ethers files from the explorer data you collected at the beginning of this process. Of course, this step requires that those files are absolutely correct and not the reason the SSPs were broke in the first place.

4) Insert the SSP cd into the SSP cdrom and cd to the Tools directory on the cdrom. (Should be something like: /cdrom/cdrom0/s0/System_Service_Processor_3.4/Tools) Issue the command ./ssp_install ../Product and type y when asked to install the SUNWsspfp package. If the installation process issues warnings about a conflicting file, and asks whether it should still be installed, answer yes.

5) Run /opt/SUNWssp/bin/ssp_config and confirm that the information for the control boards, SSPs, floating SSP name, etc., is all entered correctly. If the /etc/nsswitch.conf, /etc/hosts and /etc/ethers files have been configured properly before this stage, in step 3, all the data should be populated as you run through ssp_config and you should just have to confirm the information.

6) Reboot the SSP. After it comes back up, log in as user ssp and give the ssp user a new password. Enter in the name of the platform when asked for the SUNW_HOSTNAME. Then begin a tail -f $SSPLOGGER/messages. Wait for the message, "SSP startup complete" to appear and make sure that the SSP's role is now as the MAIN.

7) Add the necessary SSP 3.4 patches, making sure to follow all the special install instructions. Make sure that communication to the Control Boards works properly, and all SSP functions appear normal.

8) Configure the individual domains. Issue the command, domain_create -d <domain_name> -b <space separated board list> -o <OS> -p <platform_name> for each domain that is to be created. The explorer information is handy for this stage. It should have a domain_status output of the domains and their boards and OS's. Assuming that the information was correct, that information makes this domain creation stage easy.

9) Create the base eeprom.image files for each of the individual domains. To do this, the customer should provide you with the platform serial number, which you can insert into the Starfire Domain Keys Generator to produce the hostid and key list to be used in this process. It's a two step process:

  1. ssp% domain_switch <domain_name>
  2. ssp% sys_id -f /var/opt/SUNWssp/.ssp_private/eeprom_save/eeprom.image.<domain_name> -h 0x<hostid> -k <key>

10) Take the eeprom.image files that are backed up in the SSP explorer that you got in step one and copy them into each of the domain's /var/opt/SUNWssp/etc/<platform>/<domain>/eeprom.image file. This ensures that you have the correct nvram settings for each domain in the SSP configuration. Also, make sure to copy the eeprom.image files from the domain into the SSP's /var/opt/SUNWssp/.ssp_private/eeprom_save/eeprom.image.<domain_name> file. This is a reserve copy of the eeprom.image files. IT IS IMPORTANT TO NOTE THAT THE FILE SIZE OF THESE IMAGE FILES IS 8192!

11) Assuming the CBs are responding properly, arrange for platform downtime to run a platform wide autoconfig in order to update the new SSP's MIB database. After the platform autoconfig is run, you will need to reboot the SSP. Allow the SSP to configure back into Main role and begin bringing up the domains one at a time, making sure to configure the centerplane on the first domain.

12) Assuming all that works fine, and the main SSP is working just fine, run a ssp_backup of the main SSP. This will create the SSP configuration backup cpio which you will later restore on the new spare SSP.

13) Install the OS on the second SSP, if necessary. Configure the SSP software on the second SSP using the same steps as above (Steps 3-5, 7). After running through those steps on the spare SSP, obtain the ssp_backup.cpio file from the main and use ssp_restore to dump the main's config onto the spare.

Both SSPs should be configured properly now, and better able to manage the platform.

Keywords: E10K, starfire, SSP, E10000

INTERNAL SUMMARY:

SUBMITTER: Joshua Freeman APPLIES TO: Hardware/Ultra Enterprise/Servers/Enterprise 10000, AFO Vertical Team Docs/HAS ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.