SRDB ID | Synopsis | Date | ||
48183 | Sun Fire[TM] 12K/15K: SMS 1.2 does NOT recover automatically after a catastrophic event | 30 Oct 2002 |
Status | Issued |
Description |
- Problem Statement: SMS 1.2 does NOT recover automatically after a catastrophic event - Symptoms: SMS 1.2 does NOT recover automatically after a catastrophic event such as an over-temp condition without some intervention. After esmd shuts down the platform because of the over-temp condition, the breakers must be physically turned back on. After the System Controllers boot up, and SMS is running, SMS will contain the domain configurations and components at its last known state: Powered off.
SOLUTION SUMMARY:
- Troubleshooting: At this point, "showboards" will report the last known state at the time the platform shut down (components powered off). "showplatform" will hang. A forced failover will also fail because the state of SMS also has the CSB's powered off. - Resolution: This can be easily resolved by restarting SMS: /etc/init.d/sms stop /etc/init.d/sms start After restarting SMS, SMS should contain the proper state of the platform. "showboards" will show components powered on, and "setkeyswitch", and "showplatform" functionality will perform as they should. - Summary of part number and patch ID's None. - References and bug IDs BugId4721713 . This is a duplicate of BugId4620694 which is currently fixed in SMS 1.3 and addresses this particular issue. There are plans to back port this fix to SMS 1.2. There is NOT an official GA date as of yet. Currently the workaround is to recycle SMS. - Additional background information: None. - Meta-Data/Problem categorization: Product/Platform: SF12K/SF15K Category: - Keywords SMS, recover, catastrophic
INTERNAL SUMMARY:
SUBMITTER: Bruce Belisle BUG REPORT ID: 4721713, 4620694 APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: