InfoDoc ID | Synopsis | Date | ||
26286 | Sun Fire[TM] 3800-6800: System Controller failover functionality | 18 Dec 2002 |
Status | Issued |
Description |
The Sun Fire[TM] System Controller (SSC) provides management functionality and clock to the Sun Fire platform. When the master SSC fails (SSC0) and the system includes two SSCs, clock and management functionality have to fail over to the spare SC.
For Firmware 5.13.X
Starting with 5.13.X Sun Fire 6800/4810/4800/3800 systems can be configured with two system controllers for high availability. In a high-availability system controller (SC) configuration, one SC serves as the main SC, which manages all the system resources, while the other SC serves as a spare. When certain conditions cause the main SC to fail, a switch over or failover from the main SC to the spare is triggered automatically, without operator intervention. The spare SC assumes the role of the main and takes over all system controller responsibilities. New commands have been added to manage this functionality: setfailover and showfailover. setfailover -- set automatic/manual SC failover Usage: setfailover [-y|-n] off|on|force setfailover -h off -- This option prevents a failover until the failover feature is reenabled. on -- Enables failover for systems that previously had failover disabled due to a failover or an operator request. force -- Causes a forced failover to the spare SC. -y -- Do not prompt for confirmation -n -- Do not execute command if confirmation is requested -h -- Display the help message for this command This command enables you to control automatic or manual SC failover. Be aware that if you force a failover using this command, SC failover is disabled after the manual failover occurs and must be re-enabled manually using the command setfailover on.
showfailover -- Enables you to monitor the state of the SC and clock failover. Usage: showfailover [-v] showfailover -h -v -- Verbose mode. Displays all available command information, which includes both SC and clock failover status. -h -- Display this help message The SC failover state can be one of the following: enabled and active - SC failover is enabled and functioning normally. disabled - SC failover has been disabled due to an operator request (setfailover off) or because a failover has occurred. enabled but not active - SC failover is enabled, but certain components, such as the spare SC or the centerplane between the main and spare, are not in a failover-ready state (available and responding). The clock failover state can be one of the following: enabled - Clock failover is enabled. disabled - Clock failover has been automatically disabled due to a hardware problem.
Example of how to force a manual failover: 1. Connect to the System Controller (failover can be initiated from either SC). System Controller 'sunfire12-sc1': Type 0 for Platform Shell Input: 0 Platform Shell - Spare System Controller sunfire12-sc1:sc> 2. Verify that failover is enabled and active. sunfire12-sc1:sc> showfailover -v SC: SSC1 Spare System Controller SC Failover: disabled <---failover is disabled, so we must enable it. sunfire12-sc1:sc> setfailover on Dec 12 16:06:51 sunfire12-sc1 Platform.SC: SC Failover: enabled but not active. SC Failover: enabled but not active. sunfire12-sc1:sc> Dec 12 16:07:09 sunfire12-sc1 Platform.SC: SC Failover: enabled and active. sunfire12-sc1:sc> showfailover SC Failover: enabled and active. <---now failover is enabled and active. 3. Manually force the Spare System Controller to become the Main System Controller. sunfire12-sc1:sc> setfailover force SC: SSC1 Spare System Controller SC Failover: enabled and active. Clock failover enabled. This will abruptly interrupt operations on the other System Controller. This System Controller will become the main System Controller. Do you want to continue? [no] yes Dec 12 16:10:18 sunfire12-sc1 Platform.SC: SC Failover: becoming main SC ... sunfire12-sc1:sc> Dec 12 16:10:19 sunfire12-sc1 Platform.SC: SC Failover: disabled Dec 12 16:10:26 sunfire12-sc1 Platform.SC: Chassis is in single partition mode. 4. Verify the status of the system controller and re-enable failover. sunfire12-sc1:SC> showfailover SC Failover: disabled <---failover will be disabled after a failover has occured and must be re-enabled. sunfire12-sc1:SC> setfailover on Dec 12 16:14:14 sunfire12-sc1 Platform.SC: SC Failover: enabled but not active. SC Failover: enabled but not active. sunfire12-sc1:SC> Dec 12 16:14:18 sunfire12-sc1 Platform.SC: SC Failover: enabled and active.
For Firmware 5.11.X
In ScApp Release 5.11.4, the management functionality failover is NOT implemented. This means if SSC0 fails, you are not able to perform tasks which require ScApp, such as, power off system boards, check the temperature or change system board assignments. During the boot sequence of the SSC, you will get a messages indicating if clock failover is enabled or disabled. Apr 02 05:38:27 SunFire45 Chassis-Port.SC: Clock failover disabled. Apr 02 05:38:50 SunFire45 Chassis-Port.SC: Clock failover enabled. So when you reboot an SSC or an SSC fails, you will see a console message of the other SSC saying that clock failover is disabled. Once the other SSC is available again (finished rebooting), and the second clock source is available again, you should see the following message on the console of the SSC Apr 02 05:38:50 SunFire45 Chassis-Port.SC: Clock failover enabled. If you have two SSCs in your Sun Fire system with ScApp version 5.11.4, and the master SSC0 fails, you CANNOT replace the failing SSC0 without taking the entire platform down. Adding SSC1 can be done during normal platform operation, and there is no need to take any domain down.INTERNAL SUMMARY:
Check the ReadMe of Patch
For more information about the System Controller Failover in 5.13.X firmware check the following resources:
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual (Firmware Version 5.13.0), http://docs-pdf.sun.com/816-2970-10/816-2970-10.pdf
Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual (Firmware Version 5.13.0), http://docs-pdf.sun.com/816-2971-10/816-2971-10.pdf
SUBMITTER: Peter Gonscherowski PATCH ID: 800054 APPLIES TO: Hardware/Sun Fire ATTACHMENTS: