Document fins/I0696-1
FIN #: I0696-1
SYNOPSIS: Sun Fire 280R and Sun Blade 1000 systems may fail with "Red State
Exception" or "Trap32/63" after power cycle
DATE: Jul/12/01
KEYWORDS: Sun Fire 280R and Sun Blade 1000 systems may fail with "Red State
Exception" or "Trap32/63" after power cycle
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Sun Fire 280R and Sun Blade 1000 systems may fail with
"Red State Exception" or "Trap32/63" after power
cycle.
Sun Alert: No
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: Sun Fire 280R and Sun Blade 1000
PRODUCT CATEGORY: Server / SW Admin
PRODUCTS AFFECTED:
Systems Affected
----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- A28 - Sun Blade 1000 -
- A35 - Sun Fire 280R -
X-Options Affected
------------------
- - - - -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
501-5938-06 ASSY Excal/Littleneck 0 Meg TS -
REFERENCES:
PatchId: 111292 or higher - Sun Blade 1000 and Sun Fire 280R Flash
PROM Update.
PROBLEM DESCRIPTION:
All Sun Blade 1000 and Sun Fire 280R systems shipped prior to July 2001
could encounter a Trap32, Trap63, or Red State Exception failure upon
power-cycle on, of approximately 1% of power-on attempts. Although the
Sun Fire 280R has a higher incidence of this behavior at power-on, the
Sun Blade 1000 has shown to exhibit this same behavior.
During a power-cycle, the Sun Fire 280R or Sun Blade 1000 may
experience a Trap32 or Trap63 failure during POST (diag-switch? true)
or a Red State Exception during OBP (diag-switch? false), thereby
resulting in a system to fail during the boot sequence.
If a system is powered on and successfully completes POST & OBP without
incident of failure, there is absolutely no effect, due to this
condition, to the running system. This failure mode, if encountered is
easily recoverable by power-cycling the unit and will nearly always
result in a successful pass of POST & OBP upon a subsequent
power-cycle.
This problem is very intermittent (1%) and will only occur during
power-on and here is an example of a typical error message:
Example of error messages;
RED State Exception:
--------------------
CPU: 0000.0000.0000.0001
TL=0000.0000.0000.0005 TT=0000.0000.0000.0020
TPC=0000.0000.0000.4d04 TnPC=0000.0000.0000.4d08
TSTATE=0000.0000.1504.1400
TL=0000.0000.0000.0004 TT=0000.0000.0000.0068
TPC=0000.0000.0000.46a4 TnPC=0000.0000.0000.46a8
TSTATE=0000.0000.1500.1500
TL=0000.0000.0000.0003 TT=0000.0000.0000.0034
TPC=0000.0000.0000.4208 TnPC=0000.0000.0000.420c
TSTATE=0000.0000.1500.1500
TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
TPC=0000.0000.0000.0218 TnPC=0000.0000.0000.0220
TSTATE=0000.0000.1500.1500
TL=0000.0000.0000.0001 TT=0000.0000.0000.0010
TPC=0000.0000.0000.2748 TnPC=0000.0000.0000.26d0
TSTATE=0000.0044.1500.0500
Slave Timeout! [disabled]Corrected ECC Error
Diag-switch? true or keyswitch in "Diag" mode.
Trap-32:
--------
{0}* Memory address selection Initial area
{0}ERROR: TEST = * Memory address selection Initial area TESTID = 6
{0}H/W under test = MAIN MEMORY
{0} Trap level 1 Trap type 32
{0} Data access error
{0} Fault address 00000000.0001f580
{0} Fault status 00100004.0000002d
{0} (PRIV) Privileged code access error(s)
{0} (UE) Uncorrectable system data ECC error
{0} More than one bit error from memory
{0} Bank 0,2 at J0100, J0202, J0304, J0406
{0}* Memory address selection Initial area FAILED
Trap 63:
--------
{1}* Memory address selection Initial area
{1}ERROR: TEST = * Memory address selection Initial area TESTID = 67
{1}H/W under test = MAIN MEMORY
{1} Trap level 1 Trap type 63
{1} ECC error
{1} Fault address 00000000.00020030
{1} Fault status 00010000.00010000
{1} (EMC) Correctable Mtag ECC error
{1} Cannot decode ECC syndrome. Bad syndrome
{0}POST failed
{0}POST_END
The 'Trap32/63' and 'Red State Exception' errors do not require removal
or replacement of CPU modules or motherboards. The error state does
not affect system operation, and it can be resolved by power-cycling
the system as a workaround.
The RED STATE exception, as a result of the Trap 32/63 condition, is
due to an intermittent failure in the PLL portion of the CPMS memory
switch ASIC during initialization.
This is a very intermittent problem (~1% of power cycles) that is
reported on power-on. If the RED STATE error message is reported in
_consecutive_ power-on attempts replacement of the motherboard is merited.
If the system successfully passes the power-on sequence then there
is no risk of system integrity due to this intermittent CPMS memory
initialization error.
A fix can be obtained by upgrading the firmware of the system Flash
PROM to version 4.2.2 or later. Patch-ID# 111292 or later provides
this firmware version. This new firmware version contains OBP code
changes for memory initialization. There are no plans to issue an FCO
to rework the motherboards already shipped, as they will only be
replaced upon failure.
Background Information
----------------------
TRAP32 error messages are what POST generates for all data access
errors consisting of multi-bit uncorrectable (UE) errors during the
system assembly process and memory testing.
Trap63 error messages are what POST generates for all data access
errors consisting of single-bit correctable (CE) errors during the
system assembly process and memory testing.
Red State Exception messages are what OBP generates when a CPU is hung.
The Trap32/Trap63 failures contribute to this hang condition.
The Cheetah Processor Memory Switch (CPMS) phase lock loop (PLL) ASIC
was not locking ( within 40-milliseconds) at power-on. This can cause
the CPMS components to issue data errors onto the Safari bus, thereby
initiating the Trap32/Trap63 failure.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.
Power cycling the system will clear this transient error that causes
no harm to the system or functionality once booted.
Patch-ID# 111292 or later incorporates a firmware fix to POST
version 4.2.2 to correct this condition. The Patch README provides
detailed instructions for updating Flash PROM.
After the installation of this patch (#111292-02 or later), if these
failures continue upon consecutive power-cycles, then replacement of
the motherboard with part #501-5938-06 is warranted.
COMMENTS:
----------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission
critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as
the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO
index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services
Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files
for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.