SRDB ID   Synopsis   Date
48183   Sun Fire[TM] 12K/15K: SMS 1.2 does NOT recover automatically after a catastrophic event   30 Oct 2002

Status Issued

Description
- Problem Statement:

SMS 1.2 does NOT recover automatically after a catastrophic event

- Symptoms:

SMS 1.2 does NOT recover automatically after a catastrophic
event such as an over-temp condition without some intervention.
After esmd shuts down the platform because of the over-temp condition,
the breakers must be physically turned back on. After the System
Controllers boot up, and SMS is running, SMS will contain the domain
configurations and components at its last known state:  Powered off.
            

SOLUTION SUMMARY:
- Troubleshooting:

At this point, "showboards" will report the last known state at the time 
the platform shut down (components powered off). "showplatform" will hang.
A forced failover will also fail because the state of SMS also has the 
CSB's powered off.

- Resolution:

This can be easily resolved by restarting SMS:

/etc/init.d/sms stop
/etc/init.d/sms start

After restarting SMS, SMS should contain the proper state of the
platform.  "showboards" will show components powered on, and
"setkeyswitch", and "showplatform" functionality will perform as they should.

- Summary of part number and patch ID's

None.

- References and bug IDs

BugId 4721713.  This is a duplicate of BugId 4620694 which is currently fixed
in SMS 1.3 and addresses this particular issue. There are plans to back port 
this fix to SMS 1.2. There is NOT an official GA date as of yet. Currently 
the workaround is to recycle SMS.

- Additional background information:

None.

- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

SMS, recover, catastrophic            

INTERNAL SUMMARY:

SUBMITTER: Bruce Belisle BUG REPORT ID: 4721713, 4620694 APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.