Document fins/I0850-1
FIN #: I0850-1
SYNOPSIS: Sun Fire 3800/4800/4810/6800 5.13.x firmware issues
DATE: Jul/29/02
KEYWORDS: Sun Fire 3800/4800/4810/6800 5.13.x firmware issues
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Sun Fire 3800/4800/4810/6800 5.13.x firmware issues.
SunAlert: No
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: Firmware 5.13.x
PRODUCT CATEGORY: Server / SW Admin
PRODUCTS AFFECTED:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- S8 - Sun Fire 3800 -
- S12 - Sun Fire 4800 -
- S12i - Sun Fire 4810 -
- S24 - Sun Fire 6800 -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- - - - -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
- - -
REFERENCES:
PatchId: 112494-xx: Hardware/PROM: Sun Fire 6800/4810/4800/3800 Systems
Firmware Update.
URL:
http://cpre-amer.west/esg/msg/techinfo/platform/sun_fire/firmware-matrix/update-
issues2.html
PROBLEM DESCRIPTION:
New firmware, revision 5.13.x, for Sun Fire 3800/4800/4810/6800 systems
provides full System Controller (SC) failover functionality, as well as
fixing various outstanding bugs. It is important to note that there
are certain configuration procedures which must be followed when
installing or upgrading to this firmware version on a Sun Fire system.
Failure to follow these procedures may cause loss of availbility for
Sun Fire domains.
This issue affect any Sun Fire 3800/4800/4810/6800 system where the
firmware is being upgraded to version 5.13.x or where replacement FRUs
with conflicting revisions of firmware are being installed.
To determine the firmware version for a System Controller, run the
following command on the SC:
schostname:SC> showsc
To determine firmware versions for other system boards:
schostname:SC> showboards -v -p version
The Field needs to be aware of the following configuration issues with
5.13.x firmware:
. Failure to upgrade SSC1 first may result in problems such as crashed
domains, lost configuration information, and inaccessible domains.
. An SC with firmware at 5.11.x or 5.12.x will not boot if hot-plugged
into slot SSC0
. Resolving SC clock failover issues
. Resolving SC communications issues after SC failover
In addition, this FIN gives advice on the following:
. Advice on Hot-Plugging SCs with 5.13.x into a platform with older
revisions of firmware.
. Advice on replacing System Boards and I/O Boards with different
revisions of firmware.
The above issues can be avoided or resolved by following procedures
given in the Corrective Action section below. Information for these
and other Sun Fire firmware issues can be viewed at the Sun Fire
3800/4800/4810/6800 Firmware Update Matrix.
http://cpre-amer.west/esg/msg/techinfo/platform/sun_fire/firmware-matrix/
IMPLEMENTATION:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned situation.
Upgrading to 5.13.0 firmware from 5.11.x and 5.12.x
==================================================
Ensure you read the Install.info file in Patch 112494 and the
release_notes file in PatchId 112494.
Essential - Always upgrade SSC1 first
Failure to follow this instruction will result in problems such as crashed
domains, lost configuration information, and inaccessible domains.
What to do if SSC0 is upgraded first
=====================================
Here's an example of what happens if SSC0 is updated first.
4800-sc0:SC> flashupdate -f ftp://172.29.3.44/pub/112494-01 all
As part of this update, the system controller will automatically reboot.
RTOS will be upgraded automatically during the next boot.
ScApp will be upgraded automatically during the next boot.
After this update you must reboot each active domain that was upgraded.
Do you want to continue? [no] yes
.
. <TEXT DELETED>
.
The date is Thursday, May 23, 2002, 11:29:30 AM GMT+01:00.
May 23 11:29:31 4800-sc0 Platform.SC: Boot: ScApp 5.13.0, RTOS 23
May 23 11:29:36 4800-sc0 Platform.SC: Clock Source: 75MHz
May 23 11:29:38 4800-sc0 Platform.SC: SC Failover Monitor: enabled
May 23 11:30:08 4800-sc0 Platform.SC: Spare System Controller
May 23 11:30:08 4800-sc0 Platform.SC: SC Failover: enabled but not active.
System Controller '4800-sc0':
Type 0 for Platform Shell
Input: 0
Platform Shell - Spare System Controller
4800-sc0:sc>
If you connect to SC1, then you will find that this is also a spare...
# telnet 4800-sc1
System Controller '4800-sc1':
Type 0 for Platform Shell
Input: 0
Platform Shell - Slave System Controller
4800-sc1:SC>
------------------------------------------------------------------------------
Since SSC0 was upgraded first, the result is that there are now two spare
system controllers.
WARNING: DO NOT try to recover by pressing reset buttons, or re-flashing.
You will almost certainly crash any running domains on your
platform.
If SSC0 is upgraded first, there is a recovery procedure.
Engage CPRE if you find yourself in this situation.
Hot-Plugging SCs with old revs of firmware into a 5.13.x platform:
==================================================================
5.13 firmware on an SC does not mix with 5.11 and 5.12 firmware
on an SC.
If SC1 is to be replaced in a platform running 5.13.x, and the
replacement has 5.11 or 5.12 firmware loaded, recovery is simple
and outlined below.
If SC0 is to be replaced in a platform running 5.13.x, and the
replacement has 5.11 or 5.12 firmware loaded, the replacement will
not boot, as outlined below. Recovery is to remove it and put an
SC in at 5.13.x
If SC0 is to be replaced in a 5.13 platform, ensure the replacement
has 5.13.x firmware loaded on it. Double check with the control
room that this is the case.
Example - Hot-Plugging SC with old rev of firmware in slot SSC1
================================================================
Output from SSC0
sc0-4800a:SC> poweroff ssc1
SSC1: powered off
sc0-4800a:SC>
May 31 10:34:45 sc0-4800a Platform.SC: Clock failover disabled.
May 31 10:37:07 sc0-4800a Platform.SC: SSC1 removed
May 31 10:37:37 sc0-4800a Platform.SC: SSC1 inserted
sc0-4800a:SC>
sc0-4800a:SC>
May 31 10:39:57 sc0-4800a Platform.SC: SC Failover: the other SC is
running an old version of firmware which is not compatible with
failover. You need to upgrade this firmware as soon as possible.
sc0-4800a:SC>
sc0-4800a:SC>
Output from SSC1
Hardware Reset...
@(#) SYSTEM CONTROLLER(SC) POST 18 2001/06/14 11:20
PSR = 0x044010e5
PCR = 0x04004000
SelfTest running at DiagLevel:0x20
SC Boot PROM Test
BootPROM CheckSum Test
.
.
.
Console Bus Hub Test
CBH Register Access Test
POST Complete.
ERI Device Present
Getting MAC address for SSC1
MAC address is 8:0:20:d8:ab:64
Using DHCP to configure network interface
Attached TCP/IP interface to eri unit 0
Attaching interface lo0...done
interrupt: 100 Mbps full duplex link up
Initiating DHCP negotiations for eri0
dhcpcBind() failed: errno = 0xd0003
Adding 2851 symbols for standalone.
Copyright 2001 Sun Microsystems, Inc. All rights reserved.
RTOS version: 18
ScApp version: 5.11.9
SC POST diag level: min
The date is Friday, May 31, 2002, 3:39:42 AM PDT.
SbbcAsic.showResetReason: SBBC reset status=0160 POR
PowerOn or Invalid magic: Initializing the SC SRAM
May 31 03:39:46 noname Chassis-Port.SC: Backing up Static ID Info to NVCI
May 31 03:39:46 noname Chassis-Port.SC: Clock source: 75MHz
May 31 03:39:48 noname Chassis-Port.SC: Starting Slave Thread
System Controller 'noname.example.com':
Type 0 for Platform Shell
Input: 0
Platform Shell
noname:SC> showsc
SC: SSC1
SC date: Fri May 31 03:39:56 PDT 2002
SC uptime: 25 seconds
ScApp version: 5.11.9
RTOS version: 18
noname:SC>
To recover, flashupdate SC1
Example - Hot-Plugging SC with old rev of firmware in slot SSC0
================================================================
Output from SSC1
sc1-4800a:SC> poweroff ssc0
SSC0: powered off
sc1-4800a:SC>
May 31 10:48:28 sc1-4800a Platform.SC: SSC0 removed
May 31 10:49:02 sc1-4800a Platform.SC: SSC0 inserted
sc1-4800a:SC>
sc1-4800a:SC>
May 31 10:50:25 sc1-4800a Platform.SC: SC Failover: the other SC is
running an old version of firmware. It cannot be booted on this
platform. Contact your support organization.
sc1-4800a:SC>
sc1-4800a:SC>
sc1-4800a:SC>
----------------------------------------------------------------------------
Output from SSC0
Hardware Reset...
@(#) SYSTEM CONTROLLER(SC) POST 18 2001/06/14 11:20
PSR = 0x044010e5
PCR = 0x04004000
SelfTest running at DiagLevel:0x20
SC Boot PROM Test
BootPROM CheckSum Test
.
.
.
Console Bus Hub Test
CBH Register Access Test
POST Complete.
ERI Device Present
Getting MAC address for SSC0
MAC address is 8:0:20:d8:ab:63
Using DHCP to configure network interface
Attached TCP/IP interface to eri unit 0
Attaching interface lo0...done
Timeout waiting for network driver (flags=0x8062)
Adding 2851 symbols for standalone.
SC0 is un-usable at this point. There is no recovery possible, apart
from removing SC0 and replacing it with an SC at revision 5.13.x.
Hot-Plugging SCs with 5.13.x into a platform with older revs of firmware
========================================================================
1) Plugging an SC with 5.13.x firmware into a 5.12.x platform, slot SSC0
=========================================================================
Remember, the platform will have had to be powered off to effect this
FRU replacement. The state the system controllers end up in depends on
which one boots first, which is largely down to SCPOST levels and the
SC network settings. For example, an SC from logistics should be at
default settings, which means SCPOST level min and the network
configured for DHCP.
If SSC1 boots first, it will put out a heartbeat (since it is at 5.12.x)
and this will cause the SSC0 to assume the role of spare.
System Controller 'noname.example.com':
Type 0 for Platform Shell
Input: 0
Platform Shell - Spare System Controller
noname:sc>
This is not a problem. If SSC0 boots first, the SC may become confused.
Ignore this.
Flashupdate SSC0 with 5.12.x firmware, and power-cycle the platform
2) Plugging an SC with 5.13.x firmware into a 5.12.6 platform, slot SSC1
=====================================================================
Again, the platform will have had to be powered off to effect this FRU
replacement.
If SSC0 boots first, it will be the main and SSC1 the spare. Flashupdate
SSC1 with 5.12.6 firmware, and power-cycle the platform.
If SSC1 boots first, you will get a message on SSC1..
Platform.SC: SC Failover: the other SC is running an old version of
firmware. It cannot be booted on this platform. Contact your support
organization.
SSC0 will be hung, at the point the RTOS finishes loading. Ignore SSC0,
flashupdate SSC1 with 5.12.x firmware, and power-cycle the platform.
Now it will be back at SSC0 as main and SSC1 as spare.
Replacing SBs and IBs with different revs of firmware
====================================================
If you are going to replace a System Board or I/O assembly, be
aware that the replacement board firmware must be compatible with
the system controller firmware. To check the firmware compatibility
for each board, use the 'showboards' command with the "-p version"
or
"-v" option.
SB's & IB's with 5.12.x are compatible with 5.13.x. SB's and IB's
with 5.11.9 are NOT compatible with 5.13.x.
If the firmware of the replacement board is not compatible with the
firmware for the system controller, please upgrade or downgrade the
firmware on the replacement board accordingly, using 'flashupdate
-c'. It is recommended that replacement boards run the same revision
of firmware as the other boards in the domain.
SC Clock Failover Issues
========================
The SC clock failover mechanism is different than the SC failover
mechanism. The SC clock failover function does not happen at the
same time as the SC failover function. When the system is up and
running with no problems, all the boards are using a clock signal
from the main system controller, but once SC failover occurs, the
main SC and the spare SC swap their roles. Subsequently, the boards
within the system continue to use the same clock they were using
prior to the failover.
Workaround
===========
Power off the system controller. The "poweroff sscX" command will
automatically attempt to switch all the boards over to the clock
supplied by "this" SC (i.e. the SC that is not being powered off).
The "poweroff sscX" powers off the "other" system
controller, not
the one where the command is being typed.
SC Communication Issues After SC Failover
=========================================
When the system is running normally and failover is enabled, the
spare SC and the main SC communicate status and configuration
changes with each other. If a failover occurs and the main SC
transfers its responsibilities to the spare SC, failover between the
two SCs becomes disabled. With failover disabled, no data is shared
between the two SCs, and the most up-to-date configuration and
status information is not passed between the two SCs. Failover must
be manually re-enabled.
If the chassis of the system is then power-cycled, the roles of the
main SC and the spare SC may not necessarily be the same as they
were prior to the power cycle. It is possible for the system to boot
using the previously spare SC (with a possibly outdated state
configuration) as the new main SC.
Workaround
============
If failover becomes disabled, manually re-enable failover as soon as
possible so the configurations can be re-synchronized.
If this is not possible, do a dumpconfig as outlined in the Sun Fire
3800 - 6800 Platform Administration Guide. Then if the power is
cycled and SSC0 assumes the role of main, you can restore the setup
to SC0 using restoreconfig. Note that you will have to copy
<sc1_hostname>.tod and <sc1_hostname>.nvci to
<sc0_hostname>.tod and
<sc0_hostname>.nvci for this workaround.
COMMENTS:
None.
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.