Document fins/I0664-1


FIN #: I0664-1

SYNOPSIS: Rebooting the  SSP of an E10K may cause problem

DATE: Apr/09/01

KEYWORDS: Rebooting the  SSP of an E10K may cause problem


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Rebooting the System Service Processor (SSP) of an E10000
          system while SSP processes are active can cause all domains 
          to stop. 
              
TOP FIN/FCO REPORT:  Yes
 
PRODUCT_REFERENCE:   E10000 SSP
 
PRODUCT CATEGORY:    Server / SW Admin

PRODUCTS AFFECTED:  

Mkt_ID   Platform   Model   Description                Serial Number
------   --------   -----   -----------                -------------
Systems Affected
----------------
  -      E10000      ALL    Ultra Enterprise 10000           -

X-Options Affected
------------------
  -         -         -          -                           -


PART NUMBERS AFFECTED: 

Part Number   Description   Model
-----------   -----------   -----
     -             -          -


REFERENCES:

BugId:   4394892 - Potential problem with actionsysclock causing arbstops 
                   after cbe (control board executive) connects.
         4365492 - Heartbeat Failures caused 5 domains to panic.

PatchId: 110732 - SSP 3.1.1: Heartbeat Failures caused 5 domains to panic. 
         110733 - SSP 3.2: Heartbeat Failures caused 5 domains to panic. 
         110734 - SSP 3.3: Heartbeat Failures caused 5 domains to panic.
         110735 - SSP 3.4: Heartbeat Failures caused 5 domains to panic.

ESC:     527270 
         527381 
         528173 
         528506
 
MANUAL:  806-1500-10: SSP 3.2 User's Guide.
         806-1502-05: Sun Enterprise 10000 SSP 3.2 Installation Guide and
                      Release Notes.
         806-2886-10: SSP 3.3 Installation Guide and Release Notes.
         806-4872-10: SSP 3.4 Installation Guide and Release Notes.
         806-2887-10: SSP 3.3 User's Guide
         806-4870-10: SSP 3.4 User's Guide
         806-2888-10: SSP 3.3 Reference Manual
         806-4871-10: SSP 3.4 Reference Manual

Sun Alert: SA-24898

      
PROBLEM DESCRIPTION: 

Rebooting the System Service Processor (SSP) of an Ultra Enterprise
10000 system while SSP processes and daemons are active can cause all
of the system domains to crash.  When a reboot of an SSP occurs while
the cbe_reset process is running, all active domains may crash and a
heartbeat failure will be detected.  The cbe_reset process is used to
initialize the Control Board Executive (CBE) image on the primary
control board.  

Error messages for these domain crashes will be reported in the domain
specific message files located on the SSP in the following directory:

   /var/opt/SUNWssp/adm/'domain-name' 

They may include a hostreset message for all processors of the E10000
and a resulting hostresetdump file will be created in this directory
with a current time stamp.  
         
The problem can occur with the following SSP software releases: 

      SSP 3.1 
      SSP 3.1.1 
      SSP 3.2 
      SSP 3.3 
      SSP 3.4 

The problem has been fixed by a patch for the different releases 
of the SSP software.  

      SSP 3.1.1  110732 
      SSP 3.2    110733 
      SSP 3.3    110734 
      SSP 3.4    110735 
      

IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        | X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION: 

An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the procedures in either Step A 
or Step B below.

A. (Preferred) Install the appropriate patch for the SSP version in use:

      SSP 3.1.1  110732 
      SSP 3.2    110733 
      SSP 3.3    110734 
      SSP 3.4    110735 

   The above patches will upgrade the flashprom firmware on the control
   board from revision 3.46 to 3.47.  The patch README gives detailed
   instructions for upgrading the firmware.  If using SSP 3.1, please
   upgrade to one of the above SSP software releases and apply the correct
   patch.

   NOTE: Please read the patch release notes very carefully, when
         patching SSP 3.4 with respect to disabling SSP Failover.

B. As a workaround, before the patches are installed, the problem may be
   avoided by doing the following:

  1. Don't reboot or halt the SSP until it has fully initialized. That is, 
     one of the following messages appear in /var/opt/SUNWssp/adm/messages:
     
	Startup of SSP as MAIN complete (for SSP 3.4)
	Startup of SSP complete		(for SSP 3.3 or earlier)
	
  2. Don't reboot or halt the SSP if cb_reset is running.
     (# ps -ef | grep cb_reset)

  3. Follow this procedure to reboot an SSP:

     * Stop the SSP processes: /etc/init.d/ssp stop
     * Stop in.rarpd: /etc/init.d/nfs.server stop
     * Kill all in.tftpd processes, if active
     * Reboot the SSP.
 
 
COMMENTS:  

----------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        


Copyright (c) 1997-2003 Sun Microsystems, Inc.