Document fins/I0856-1


FIN #: I0856-1

SYNOPSIS: UltraSPARC III and III+ based platforms could be susceptible to UCC
          errors that may cause system panics

DATE: Oct/09/02

KEYWORDS: UltraSPARC III and III+ based platforms could be susceptible to UCC
          errors that may cause system panics


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)

 

SYNOPSIS: UltraSPARC III and III+ based platforms could be susceptible 
          to UCC errors that may cause system panics.


Sun Alert:          Yes

TOP FIN/FCO REPORT: Yes

PRODUCT_REFERENCE:  UltraSPARC III family of processors

PRODUCT CATEGORY:   Server / SW Admin


PRODUCTS AFFECTED:

Systems Affected
------- --------
Mkt_ID   Platform   Model   Description         Serial Number
------   --------   -----   -----------         -------------
  -      A28         ALL    Sun Blade 1000            -
  -      A29         ALL    Sun Blade 2000            -
  -      A35         ALL    Sun Fire 280R             -
  -      A37         ALL    Sun Fire V480             -
  -      A30         ALL    Sun Fire V880             -
  -      S8          ALL    Sun Fire 3800             -
  -      S12         ALL    Sun Fire 4800             -
  -      S12i        ALL    Sun Fire 4810             -
  -      S24         ALL    Sun Fire 6800             -
  -      F12K        ALL    Sun Fire 12K              -
  -	 F15K	     ALL    Sun Fire 15K	      -
  -      N28         ALL    Netra 20                  -


X-options Affected    
--------- --------
Mkt_ID          Platform   Model   Description                  Serial Number
------          --------   -----   -----------                  -------------
X4007A             -         -     ASSY CPU-4PROC USIIIP 900MHz       -
X4525A             -         -     ASSY MAXCPU 900MHz CNFIG F15K      -
X4004A             -         -     ASSY CPU-2PROC USIII  750MHz       -
X4005A             -         -     ASSY CPU-4PROC USIII  900MHz       -
X4006A             -         -     ASSY CPU-2PROC USIIIP 900MHz       -
X4046A             -         -     ASSY CPU DUAL 750MHz AL A30        -
X4047A             -         -     ASSY CPU DUAL 750MHz AL A30        -
XCPUBD-4049        -         -     ASSY CPU-4GB/4PROC USIII 900+M     -
XCPUBD-F4089       -         -     ASSY CPU-8GB/4PROC USIII 900+M     -
XCPUBD-F4169       -         -     ASSY CPU-16GB/4PROC USIII 900+M    -
XCPUBD-F4329       -         -     ASSY CPU-32GB/4PROC USIII 900+M    -
XCPUBD-2029        -         -     ASSY CPU-2GB/2PROC USIII 900+M     -
XCPUBD-2049        -         -     ASSY CPU-4GB/2PROC USIII 900+M     -
XCPUBD-2089        -         -     ASSY CPU-8GB/2PROC USIII 900+M     -
SF-XCPUBD-227      -         -     ASSY CPU-2GB/2PROC USIII 750MHz    -
SF-XCPUBD-447      -         -     ASSY CPU-4GB/4PROC USIII 750MHz    -
SF-XCPUBD-487      -         -     ASSY CPU-8GB/4PROC 512MB USIII     -


PART NUMBERS AFFECTED:

Part Number             Description                              Model
-----------             -----------                              -----
540-5052-02 or below    ASSY CPU-4PROC USIIIP 900+ MHz             -
540-4729-04 or below    ASSY CPU-2PROC USIII 750MHz                -
540-4730-04 or below    ASSY CPU-4PROC USIII 750MHz                -
540-5051-02 or below    ASSY CPU-2PROC USIIIP 900+ MHz             -
501-5818-06 or below    ASSY CPU DUAL 750MHz AL A30                -
540-4934-03 or below    ASSY CPU-4GB/4PROC USIII 900+ MHz          -
540-4992-02 or below    ASSY CPU-8GB/4PROC USIII 900+ MHz          -
540-4990-03 or below    ASSY CPU-16GB/4PROC USIII 900+ MHz         -
540-4993-02 or below    ASSY CPU-32GB/4PROC USIII 900+ MHz         -
540-4984-02 or below    ASSY CPU-2GB/2PROC USIII 900+ MHz          -


REFERENCES:

BugId:	  4466085 - Cheetah UCC error at TL>0 causes panic
          4718366 - fast_ecc_err() uses 32bit load for 64bit address 
                    values.
                    
FIN: I0887-1

SunAlert: 45527


PROBLEM DESCRIPTION:

UltraSPARC III and III+ based platforms might panic on Correctable Fast 
ECC (UCC) errors, instead of correcting them, resulting in loss of service of 
the system.

This issue concerns a problem with software routines used in the detection 
and correction of Fast ECC errors when they occur.  See FIN I0887-1 for a 
more detailed explanation for this type of error on UltraSPARC III and 
III+ platforms.

This issue can occur in the following releases: 

UltraSPARC III: (see Notes section) 

     Solaris 8 
     Solaris 9 

UltraSPARC III+: 

     Solaris 8 without patch 108528 
     Solaris 9 

Notes: 

  1. Releases prior to Solaris 8 are not implemented on UltraSPARC III
     and III+ based systems and therefore are not affected.

  2. UltraSPARC-III+ based systems running Solaris 9 are not impacted by 
     BugID 4718366. They are however impacted by BugID 4466085. UltraSPARC-III 
     based systems running Solaris 9 are impacted by both BugID 4466085 and 
     BugID 4718366. 

  3. Some UltraSPARC III based servers may still be susceptible to UCC errors 
     that cause system panics after the installation of 108528 due to the 
     introduction of Bug 4718366 in patch 108528.  This applies to:
 
     All Sun Fire V880 with 750 Mhz or 900 Mhz UltraSPARC III
     processors and any of the following, if the value of
     "ecache_scrub_flushaddr" is non-zero:
 
       * Sun Blade 1000 with greater than 4Gb of memory and 750Mhz or  
         900 Mhz US-III processors.
     
       * Sun Fire 280R with greater than 4Gb of memory and 750Mhz or  
         900 Mhz US-III processors.
     
       * Sun Fire Servers (3800, 4800, 4810, 6800) with greater than  
         4Gb of memory and 750Mhz or 900 Mhz US-III processors.
    
       * Netra 20 with greater than 4Gb of memory and 750 Mhz or 900 Mhz 
         US-III processors.

To determine if ecache_scrub_flushaddr is non-zero, run the following
sequence of commands as root on the system: 

        adb -k <<EOF
        ecache_scrub_flushaddr/X
        EOF            

This value may be non-zero due to memory addressing, which can vary
over time, such as when swapping to disk.

To differentiate between US-III and US-III+ the prtconf(1M) command
will show the following: 

    # prtconf | grep -i ultrasparc
    SUNW,UltraSPARC-III, instance #0
    SUNW,UltraSPARC-III, instance #1

    # prtconf | grep -i ultrasparc
    SUNW,UltraSPARC-III+, instance #0
    SUNW,UltraSPARC-III+, instance #1                        

The US-III+ is referred to as the "UltraSPARC-III Cu". (note the
"Cu") 
The following table shows which platforms utilize which type of processor. 

   Platform                UltraSPARC-III      UltraSPARC-III+
   -----------             --------------      ---------------
   Sun Blade 1000                X                    -
   Sun Blade 2000                -                    X
   Sun Fire 280R                 X                    X
   Sun Fire V480                 -                    X
   Sun Fire V880                 X                    X
   Sun Fire 3800                 X                    X
   Sun Fire 4800                 X                    X
   Sun Fire 4810                 X                    X
   Sun Fire 6800                 X                    X
   Sun Fire 12K                  -                    X
   Sun Fire 15K                  -                    X
   Netra 20                      X                    X 
     
On Solaris 9, and Solaris 8 without 108528 software installed, UCC
errors resulting in a system panic appear similar to the following
abbreviated sample error message, in the /var/adm/messages log file: 
                                                                  
    Hostname: Server
    Release SunOS 5.8 Generic_108528-12
    System crashed at: 2001 Jun 28 10:02:44 GMT
        
    WARNING: [AFT1] UCC Event on CPU1 in Privileged mode at TL>0,
    errID 0x0000009e.c2733708
    AFSR 0x00100400<PRIV,UCC>.00000015 AFAR 0x00000000.d5557700
    Fault_PC 0x102047e8 Esynd 0x0015
    [AFT1] errID 0x0000009e.c2733708 Data Bit 38 was in error and corrected
    [AFT2] errID 0x0000009e.c2733708 PA=0x00000000.d5557700
    E$tag 0x00000001.aa924924 E$state_4 Modified
    [AFT2] E$Data (0x00) 0x00000000.00000002 0x00000000.564f4c57 ECC 0x022
    [AFT2] E$Data (0x10) 0x00000000.ffbef8d8 0x00000000.00100003 ECC 0x01b
    [AFT2] E$Data (0x20) 0x00000300.01031f28 0x000002a1.00557aec ECC 0x06f
    [AFT2] E$Data (0x30) 0x000002a1.00556f81 0x00000000.102d05b8 ECC 0x083
    [AFT2] D$tag 0x000d5557 D$state Valid D$utag 0x55 D$snp 0x000d5556
    [AFT2] D$Data (0x00) 0x00000000.00000002 0x00000040.564f4c57
    [AFT2] D$Data (0x10) 0x00000000.ffbef8d8 0x00000000.00100003
    [AFT2] I$ data not available
    WARNING: [AFT1] WDC Event on CPU1 at TL>0, errID 0x0000009e.c2733708
    AFSR 0x00000040<WDC>.00000015 AFAR 0x00000000.d5557700
    Fault_PC 0x102047e8 Esynd 0x0015
    [AFT1] errID 0x0000009e.c2733708 Data Bit 38 was in error and corrected
    panic[cpu1]/thread=30002b56ae0: [AFT1] errID 0x0000009e.c2733708 UCC WDC 
    Error(s)

On Solaris 8 Sun Fire V880 platforms, with 108528 installed, UCC
errors resulting in a system panic may appear similar to the following
abbreviated sample error messages, from the system console or
/var/adm/messages log file:

     WARNING: [AFT1] Timeout (TO) Event detected by CPU2 in Privileged  
     mode at TL=0, errID 0x00000081.437ed5b4
        AFSR 0x00201000<ME,TO>.00000000 AFAR 0x00000000.00fffe20
        Fault_PC 0x30002cfd040
     panic[cpu2]/thread=300025ece60: [AFT1] errID 0x00000081.437ed5b4 
     UCC TO Error(s)

On Solaris 8 UltraSPARC-III platforms other than the Sun Fire V880
platform, with 108528 installed, UCC errors resulting in a system
panic may appear similar to the following abbreviated sample error
messages, from the system console or /var/adm/messages log file:

       panic: ptl1 trap reason 0x8
       TL=0x1 TT=0x68 TICK=0x68155385e6b
               TPC=0xff2c24cc TnPC=0xff2c24d0 TSTATE=0x4482001a00
       TL=0x2 TT=0x70 TICK=0x68155385e6b
               TPC=0x10000d20 TnPC=0x10000d24 TSTATE=0x4482041400
       TL=0x3 TT=0x34 TICK=0x68155385e69
               TPC=0x10144a98 TnPC=0x10144a90 TSTATE=0x82081400
       panic[cpu0]/thread=3000d26cf20:
       User panic at trap level 3

Note that the line beginning with TL=0x2 TT=0x70 identifies this "Fast
ECC" trap induced panic.

The root cause for this issue is related to the hardware architecture
of the UltraSPARC III family, particularly in how the "Fast ECC" trap
is handled by software at elevated Trap Levels (TL>0).  TL>0 means a
trap occured within another trap handler.

Note: The cause of repeated UCC errors for a specific system could indicate 
      a more serious problem such as a defective hardware component.  See
      FIN I0887-1 for troubleshooting tips.

Fast ECC is a specific type of ECC error which typically occurs during
a load of Data Cache or Instruction Cache (D$/I$) from E-Cache (E$).
"Fast ECC", which is the Hardware Architecture name for the trap that
is taken, could perhaps be called "Software Correctable ECC", as the
hardware does not have sufficient time to supply corrected data to the
D$/I$ and software must intervene to make sure that corrupted data is
not used.

For UltraSPARC-III+ platforms, this issue has been addressed by Solaris
8 Patch 108528.  For UltraSPARC-III platforms, this issue
has been addressed by Solaris 8 Patch 108528.

For UltraSPARC III and UltraSPARC III+ platforms with Solaris 9, this
issue has been addressed by Patch 112233.


IMPLEMENTATION: 

         ---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
        

CORRECTIVE ACTION:

The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.

It is recommended that all UltraSPARC III and III+ based systems be updated
with the following patches:

	UltraSPARC III
	--------------
	Solaris 8 108528 or later
	Solaris 9 112233 or later

	UltraSPARC III+
	---------------
	Solaris 8 108528 or later
	Solaris 9 112233 or later

COMMENTS:

None

============================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
--------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.