Document fins/I0696-1


FIN #: I0696-1

SYNOPSIS: Sun Fire 280R and Sun Blade 1000 systems may fail with "Red State
          Exception" or "Trap32/63" after power cycle

DATE: Jul/12/01

KEYWORDS: Sun Fire 280R and Sun Blade 1000 systems may fail with "Red State
          Exception" or "Trap32/63" after power cycle


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS:  Sun Fire 280R and Sun Blade 1000 systems may fail with 
           "Red State Exception" or "Trap32/63" after power
cycle.


Sun Alert:          No

TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  Sun Fire 280R and Sun Blade 1000   
 
PRODUCT CATEGORY:   Server / SW Admin 


PRODUCTS AFFECTED:  

Systems Affected
----------------
Mkt_ID   Platform      Model   Description          Serial Number
------   --------      -----   -----------          -------------
  -        A28           -     Sun Blade 1000             -
  -        A35           -     Sun Fire 280R              -    
  
X-Options Affected
------------------
  -         -            -          -                     -        


PART NUMBERS AFFECTED: 

Part Number   Description                               Model
-----------   -----------                               -----
501-5938-06   ASSY Excal/Littleneck 0 Meg TS              -
  

REFERENCES:

PatchId: 111292 or higher - Sun Blade 1000 and Sun Fire 280R Flash 
                               PROM Update.

      
PROBLEM DESCRIPTION:  
  
All Sun Blade 1000 and Sun Fire 280R systems shipped prior to July 2001
could encounter a Trap32, Trap63, or Red State Exception failure upon
power-cycle on, of approximately 1% of power-on attempts. Although the
Sun Fire 280R has a higher incidence of this behavior at power-on, the
Sun Blade 1000 has shown to exhibit this same behavior.
 
During a power-cycle, the Sun Fire 280R or Sun Blade 1000 may
experience a Trap32 or Trap63 failure during POST (diag-switch? true)
or a Red State Exception during OBP (diag-switch? false), thereby
resulting in a system to fail during the boot sequence.  
 
If a system is powered on and successfully completes POST & OBP without
incident of failure, there is absolutely no effect, due to this
condition, to the running system.  This failure mode, if encountered is
easily recoverable by power-cycling the unit and will nearly always
result in a successful pass of POST & OBP upon a subsequent
power-cycle.  
 
This problem is very intermittent (1%) and will only occur during 
power-on and here is an example of a typical error message:

Example of error messages;

RED State Exception:
--------------------
  CPU: 0000.0000.0000.0001
  TL=0000.0000.0000.0005 TT=0000.0000.0000.0020
    TPC=0000.0000.0000.4d04 TnPC=0000.0000.0000.4d08
TSTATE=0000.0000.1504.1400
  TL=0000.0000.0000.0004 TT=0000.0000.0000.0068
    TPC=0000.0000.0000.46a4 TnPC=0000.0000.0000.46a8
TSTATE=0000.0000.1500.1500
  TL=0000.0000.0000.0003 TT=0000.0000.0000.0034
    TPC=0000.0000.0000.4208 TnPC=0000.0000.0000.420c
TSTATE=0000.0000.1500.1500
  TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
    TPC=0000.0000.0000.0218 TnPC=0000.0000.0000.0220
TSTATE=0000.0000.1500.1500
  TL=0000.0000.0000.0001 TT=0000.0000.0000.0010
    TPC=0000.0000.0000.2748 TnPC=0000.0000.0000.26d0
TSTATE=0000.0044.1500.0500

  Slave Timeout! [disabled]Corrected ECC Error


  Diag-switch? true or keyswitch in "Diag" mode.

Trap-32:
-------- 
  {0}* Memory address selection Initial area
  {0}ERROR: TEST = * Memory address selection Initial area TESTID = 6
  {0}H/W under test = MAIN MEMORY
  {0}     Trap level 1  Trap type 32
  {0}     Data access error
  {0}     Fault address 00000000.0001f580
  {0}     Fault  status 00100004.0000002d
  {0}     (PRIV) Privileged code access error(s)
  {0}     (UE) Uncorrectable system data ECC error
  {0}     More than one bit error from memory
  {0}     Bank 0,2 at J0100, J0202, J0304, J0406
  {0}* Memory address selection Initial area FAILED

Trap 63:
-------- 
  {1}* Memory address selection Initial area
  {1}ERROR: TEST = * Memory address selection Initial area TESTID = 67
  {1}H/W under test = MAIN MEMORY
  {1}     Trap level 1  Trap type 63
  {1}     ECC error
  {1}     Fault address 00000000.00020030
  {1}     Fault  status 00010000.00010000
  {1}     (EMC) Correctable Mtag ECC error
  {1}     Cannot decode ECC syndrome. Bad syndrome
  {0}POST failed
  {0}POST_END

The 'Trap32/63' and 'Red State Exception' errors do not require removal
or replacement of CPU modules or motherboards.  The error state does
not affect system operation, and it can be resolved by power-cycling
the system as a workaround.

The RED STATE exception, as a result of the Trap 32/63 condition, is
due to an intermittent failure in the PLL portion of the CPMS memory
switch ASIC during initialization.

This is a very intermittent problem (~1% of power cycles) that is
reported on power-on.  If the RED STATE error message is reported in 
_consecutive_ power-on attempts replacement of the motherboard is merited. 
If the system successfully passes the power-on sequence then there
is no risk of system integrity due to this intermittent CPMS memory 
initialization error.

A fix can be obtained by upgrading the firmware of the system Flash
PROM to version 4.2.2 or later.  Patch-ID# 111292 or later provides
this firmware version.  This new firmware version contains OBP code
changes for memory initialization.  There are no plans to issue an FCO
to rework the motherboards already shipped, as they will only be
replaced upon failure.

Background Information
----------------------
TRAP32 error messages are what POST generates for all data access 
errors consisting of multi-bit uncorrectable (UE) errors during the 
system assembly process and memory testing.  
 
Trap63 error messages are what POST generates for all data access 
errors consisting of single-bit correctable (CE) errors during the 
system assembly process and memory testing. 
 
Red State Exception messages are what OBP generates when a CPU is hung.  
The Trap32/Trap63 failures contribute to this hang condition.

The Cheetah Processor Memory Switch (CPMS) phase lock loop (PLL) ASIC 
was not locking ( within 40-milliseconds) at power-on.  This can cause 
the CPMS components to issue data errors onto the Safari bus, thereby 
initiating the Trap32/Trap63 failure.


IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        | X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION:    

An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.

Power cycling the system will clear this transient error that causes 
no harm to the system or functionality once booted.

Patch-ID# 111292 or later incorporates a firmware fix to POST 
version 4.2.2 to correct this condition.  The Patch README provides 
detailed instructions for updating Flash PROM.

After the installation of this patch (#111292-02 or later), if these 
failures continue upon consecutive power-cycles, then replacement of 
the motherboard with part #501-5938-06 is warranted.


COMMENTS:  

----------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission
critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as
the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO
index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services
Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files
for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------




Copyright (c) 1997-2003 Sun Microsystems, Inc.