Document fins/I0850-1


FIN #: I0850-1

SYNOPSIS: Sun Fire 3800/4800/4810/6800 5.13.x firmware issues

DATE: Jul/29/02

KEYWORDS: Sun Fire 3800/4800/4810/6800 5.13.x firmware issues


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)

           

SYNOPSIS: Sun Fire 3800/4800/4810/6800 5.13.x firmware issues.
      

SunAlert:           No

TOP FIN/FCO REPORT: Yes 
  
PRODUCT_REFERENCE:  Firmware 5.13.x 
 
PRODUCT CATEGORY:   Server / SW Admin


PRODUCTS AFFECTED:  

Systems Affected:
-----------------  
Mkt_ID   Platform   Model   Description          Serial Number
------   --------   -----   -----------          -------------
  -        S8         -     Sun Fire 3800              -
  -        S12        -     Sun Fire 4800              -
  -        S12i       -     Sun Fire 4810              -
  -        S24        -     Sun Fire 6800              -


X-Options Affected:
-------------------
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------
  -         -         -          -              -


PART NUMBERS AFFECTED: 

Part Number   Description        Model
-----------   -----------        -----
     -             -               -


REFERENCES:

PatchId: 112494-xx: Hardware/PROM: Sun Fire 6800/4810/4800/3800 Systems 
                    Firmware Update.

URL: 
http://cpre-amer.west/esg/msg/techinfo/platform/sun_fire/firmware-matrix/update-
issues2.html 
 
     
PROBLEM DESCRIPTION:

New firmware, revision 5.13.x, for Sun Fire 3800/4800/4810/6800 systems
provides full System Controller (SC) failover functionality, as well as
fixing various outstanding bugs.  It is important to note that there
are certain configuration procedures which must be followed when
installing or upgrading to this firmware version on a Sun Fire system.
Failure to follow these procedures may cause loss of availbility for
Sun Fire domains.

This issue affect any Sun Fire 3800/4800/4810/6800 system where the
firmware is being upgraded to version 5.13.x or where replacement FRUs
with conflicting revisions of firmware are being installed.

To determine the firmware version for a System Controller, run the
following command on the SC:

	schostname:SC> showsc

To determine firmware versions for other system boards:

	schostname:SC> showboards -v -p version

The Field needs to be aware of the following configuration issues with 
5.13.x firmware:

  . Failure to upgrade SSC1 first may result in problems such as crashed
    domains, lost configuration information, and inaccessible domains.

  . An SC with firmware at 5.11.x or 5.12.x will not boot if hot-plugged 
    into slot SSC0

  . Resolving SC clock failover issues 

  . Resolving SC communications issues after SC failover 

In addition, this FIN gives advice on the following:

  . Advice on Hot-Plugging SCs with 5.13.x into a platform with older  
    revisions of firmware. 
      
  . Advice on replacing System Boards and I/O Boards with different  
    revisions of firmware.
    
The above issues can be avoided or resolved by following procedures
given in the Corrective Action section below.  Information for these
and other Sun Fire firmware issues can be viewed at the Sun Fire
3800/4800/4810/6800 Firmware Update Matrix.

  http://cpre-amer.west/esg/msg/techinfo/platform/sun_fire/firmware-matrix/


IMPLEMENTATION: 

         ---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---


CORRECTIVE ACTION:

The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned situation.

Upgrading to 5.13.0 firmware from 5.11.x and 5.12.x
==================================================

Ensure you read the Install.info file in Patch 112494 and the
release_notes file in PatchId 112494. 

   Essential - Always upgrade SSC1 first

Failure to follow this instruction will result in problems such as crashed 
domains, lost configuration information, and inaccessible domains. 

What to do if SSC0 is upgraded first
=====================================

Here's an example of what happens if SSC0 is updated first. 

   4800-sc0:SC> flashupdate -f ftp://172.29.3.44/pub/112494-01 all

As part of this update, the system controller will automatically reboot.
RTOS will be upgraded automatically during the next boot. 
ScApp will be upgraded automatically during the next boot. 

After this update you must reboot each active domain that was upgraded.

   Do you want to continue? [no] yes
      .
      .  <TEXT DELETED>
      .

   The date is Thursday, May 23, 2002, 11:29:30 AM GMT+01:00.

   May 23 11:29:31 4800-sc0 Platform.SC: Boot: ScApp 5.13.0, RTOS 23
   May 23 11:29:36 4800-sc0 Platform.SC: Clock Source: 75MHz
   May 23 11:29:38 4800-sc0 Platform.SC: SC Failover Monitor: enabled
   May 23 11:30:08 4800-sc0 Platform.SC: Spare System Controller
   May 23 11:30:08 4800-sc0 Platform.SC: SC Failover: enabled but not active.
    
   System Controller '4800-sc0':

       Type  0  for Platform Shell

       Input: 0

       Platform Shell - Spare System Controller

   4800-sc0:sc> 
   
   If you connect to SC1, then you will find that this is also a spare...
   
       # telnet 4800-sc1

    System Controller '4800-sc1':

       Type  0  for Platform Shell

       Input: 0

   Platform Shell - Slave System Controller

   4800-sc1:SC> 
   
------------------------------------------------------------------------------

Since SSC0 was upgraded first, the result is that there are now two spare
system controllers.
    
WARNING: DO NOT try to recover by pressing reset buttons, or re-flashing. 
         You will almost certainly crash any running domains on your 
         platform.
   
If SSC0 is upgraded first, there is a recovery procedure.
    
Engage CPRE if you find yourself in this situation.

Hot-Plugging SCs with old revs of firmware into a 5.13.x platform:
==================================================================

   5.13 firmware on an SC does not mix with 5.11 and 5.12 firmware
   on an SC.
    
   If SC1 is to be replaced in a platform running 5.13.x, and the
   replacement has 5.11 or 5.12 firmware loaded, recovery is simple
   and outlined below.

   If SC0 is to be replaced in a platform running 5.13.x, and the
   replacement has 5.11 or 5.12 firmware loaded, the replacement will
   not boot, as outlined below.  Recovery is to remove it and put an
   SC in at 5.13.x

   If SC0 is to be replaced in a 5.13 platform, ensure the replacement
   has 5.13.x firmware loaded on it.  Double check with the control
   room that this is the case.

   Example - Hot-Plugging SC with old rev of firmware in slot SSC1
   ================================================================ 

   Output from SSC0

      sc0-4800a:SC> poweroff ssc1

      SSC1: powered off 

      sc0-4800a:SC>  

        May 31 10:34:45 sc0-4800a Platform.SC: Clock failover disabled. 

        May 31 10:37:07 sc0-4800a Platform.SC: SSC1 removed 
        May 31 10:37:37 sc0-4800a Platform.SC: SSC1 inserted 

      sc0-4800a:SC>  

      sc0-4800a:SC>  

        May 31 10:39:57 sc0-4800a Platform.SC: SC Failover: the other SC is
        running an old version of firmware which is not compatible with
        failover.  You need to upgrade this firmware as soon as possible. 

      sc0-4800a:SC>  
      sc0-4800a:SC>  

   Output from SSC1

      Hardware Reset... 

      @(#) SYSTEM CONTROLLER(SC) POST 18 2001/06/14 11:20 
      PSR = 0x044010e5 
      PCR = 0x04004000 

            SelfTest running at DiagLevel:0x20 

      SC Boot PROM                          Test  
      BootPROM CheckSum                     Test  
      . 
      . 
      . 

      Console Bus Hub                       Test  
      CBH Register Access                   Test
 
    POST Complete. 
    ERI Device Present 
    Getting MAC address for SSC1 
    MAC address is 8:0:20:d8:ab:64 
    Using DHCP to configure network interface 
    Attached TCP/IP interface to eri unit 0 
    Attaching interface lo0...done 
    interrupt: 100 Mbps full duplex link up 
    Initiating DHCP negotiations for eri0 
    dhcpcBind() failed: errno = 0xd0003 

    Adding 2851 symbols for standalone. 

        Copyright 2001 Sun Microsystems, Inc.  All rights reserved. 

    RTOS version: 18 
    ScApp version: 5.11.9 
    SC POST diag level: min 

    The date is Friday, May 31, 2002, 3:39:42 AM PDT. 

    SbbcAsic.showResetReason: SBBC reset status=0160 POR 
    PowerOn or Invalid magic: Initializing the SC SRAM 
    May 31 03:39:46 noname Chassis-Port.SC: Backing up Static ID Info to NVCI 
    May 31 03:39:46 noname Chassis-Port.SC: Clock source: 75MHz 
    May 31 03:39:48 noname Chassis-Port.SC: Starting Slave Thread 

    System Controller 'noname.example.com': 

        Type  0  for Platform Shell 

        Input: 0

    Platform Shell 

    noname:SC> showsc 

    SC: SSC1  

    SC date: Fri May 31 03:39:56 PDT 2002 
    SC uptime: 25 seconds  

    ScApp version: 5.11.9 
    RTOS version: 18 

    noname:SC>  
    
    To recover, flashupdate SC1
     
   Example - Hot-Plugging SC with old rev of firmware in slot SSC0 
   ================================================================
   Output from SSC1

    sc1-4800a:SC> poweroff ssc0 

    SSC0: powered off 

    sc1-4800a:SC>  

    May 31 10:48:28 sc1-4800a Platform.SC: SSC0 removed 
    May 31 10:49:02 sc1-4800a Platform.SC: SSC0 inserted 

    sc1-4800a:SC>  
    sc1-4800a:SC>  

    May 31 10:50:25 sc1-4800a Platform.SC: SC Failover: the other SC is 
    running an old version of firmware.  It cannot be booted on this 
    platform.  Contact your support organization. 

    sc1-4800a:SC>  
    sc1-4800a:SC>  
    sc1-4800a:SC>  

   ----------------------------------------------------------------------------

   Output from SSC0

     Hardware Reset... 
      
     @(#) SYSTEM CONTROLLER(SC) POST 18 2001/06/14 11:20 
     PSR = 0x044010e5 
     PCR = 0x04004000 

            SelfTest running at DiagLevel:0x20 

     SC Boot PROM             Test  
     BootPROM CheckSum               Test  
     . 
     . 
     . 
     Console Bus Hub          Test  
            CBH Register Access                 Test 
     POST Complete. 
     ERI Device Present 
     Getting MAC address for SSC0 
     MAC address is 8:0:20:d8:ab:63 
     Using DHCP to configure network interface 
     Attached TCP/IP interface to eri unit 0 
     Attaching interface lo0...done 
     Timeout waiting for network driver (flags=0x8062) 
 
     Adding 2851 symbols for standalone. 
      
     SC0 is un-usable at this point.  There is no recovery possible, apart 
     from removing SC0 and replacing it with an SC at revision 5.13.x.

     Hot-Plugging SCs with 5.13.x into a platform with older revs of firmware
     ========================================================================

1) Plugging an SC with 5.13.x firmware into a 5.12.x platform, slot SSC0 
   =========================================================================

   Remember, the platform will have had to be powered off to effect this
   FRU replacement.  The state the system controllers end up in depends on
   which one boots first, which is largely down to SCPOST levels and the
   SC network settings.  For example, an SC from logistics should be at
   default settings, which means SCPOST level min and the network
   configured for DHCP.

   If SSC1 boots first, it will put out a heartbeat (since it is at 5.12.x) 
   and this will cause the SSC0 to assume the role of spare. 

     System Controller 'noname.example.com': 

         Type  0  for Platform Shell 
 
         Input: 0

     Platform Shell - Spare System Controller 

     noname:sc> 

   This is not a problem.  If SSC0 boots first, the SC may become confused. 
   Ignore this. 

   Flashupdate SSC0 with 5.12.x firmware, and power-cycle the platform 

2) Plugging an SC with 5.13.x firmware into a 5.12.6 platform, slot SSC1
   ===================================================================== 

   Again, the platform will have had to be powered off to effect this FRU 
   replacement. 

   If SSC0 boots first, it will be the main and SSC1 the spare.  Flashupdate 
   SSC1 with 5.12.6 firmware, and power-cycle the platform. 

   If SSC1 boots first, you will get a message on SSC1.. 

   Platform.SC: SC Failover: the other SC is running an old version of 
   firmware. It cannot be booted on this platform.  Contact your support 
   organization.

   SSC0 will be hung, at the point the RTOS finishes loading.  Ignore SSC0, 
   flashupdate SSC1 with 5.12.x firmware, and power-cycle the platform. 
   Now it will be back at SSC0 as main and SSC1 as spare. 


   Replacing SBs and IBs with different revs of firmware
   ====================================================

   If you are going to replace a System Board or I/O assembly, be
   aware that the replacement board firmware must be compatible with
   the system controller firmware.  To check the firmware compatibility
   for each board, use the 'showboards' command with the "-p version"
or
   "-v" option.

      SB's & IB's with 5.12.x are compatible with 5.13.x.  SB's and IB's
      with 5.11.9 are NOT compatible with 5.13.x.

   If the firmware of the replacement board is not compatible with the
   firmware for the system controller, please upgrade or downgrade the
   firmware on the replacement board accordingly, using 'flashupdate
   -c'.  It is recommended that replacement boards run the same revision
   of firmware as the other boards in the domain.
 
   SC Clock Failover Issues
   ========================

   The SC clock failover mechanism is different than the SC failover
   mechanism.  The SC clock failover function does not happen at the
   same time as the SC failover function.  When the system is up and
   running with no problems, all the boards are using a clock signal
   from the main system controller, but once SC failover occurs, the
   main SC and the spare SC swap their roles.  Subsequently, the boards
   within the system continue to use the same clock they were using
   prior to the failover. 

   Workaround
   =========== 

   Power off the system controller.  The "poweroff sscX" command will
   automatically attempt to switch all the boards over to the clock
   supplied by "this" SC (i.e. the SC that is not being powered off).
   The "poweroff sscX" powers off the "other" system
controller, not
   the one where the command is being typed.
 
   SC Communication Issues After SC Failover
   =========================================

   When the system is running normally and failover is enabled, the
   spare SC and the main SC communicate status and configuration
   changes with each other.  If a failover occurs and the main SC
   transfers its responsibilities to the spare SC, failover between the
   two SCs becomes disabled.  With failover disabled, no data is shared
   between the two SCs, and the most up-to-date configuration and
   status information is not passed between the two SCs.  Failover must
   be manually re-enabled.

   If the chassis of the system is then power-cycled, the roles of the
   main SC and the spare SC may not necessarily be the same as they
   were prior to the power cycle.  It is possible for the system to boot
   using the previously spare SC (with a possibly outdated state
   configuration) as the new main SC.
 

   Workaround
   ============
  
   If failover becomes disabled, manually re-enable failover as soon as
   possible so the configurations can be re-synchronized.

   If this is not possible, do a dumpconfig as outlined in the Sun Fire
   3800 - 6800 Platform Administration Guide.  Then if the power is
   cycled and SSC0 assumes the role of main, you can restore the setup
   to SC0 using restoreconfig.  Note that you will have to copy
   <sc1_hostname>.tod and <sc1_hostname>.nvci to
<sc0_hostname>.tod and
   <sc0_hostname>.nvci for this workaround.


COMMENTS:  

None.

============================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.