Document fins/I0856-1
FIN #: I0856-1
SYNOPSIS: UltraSPARC III and III+ based platforms could be susceptible to UCC
errors that may cause system panics
DATE: Oct/09/02
KEYWORDS: UltraSPARC III and III+ based platforms could be susceptible to UCC
errors that may cause system panics
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: UltraSPARC III and III+ based platforms could be susceptible
to UCC errors that may cause system panics.
Sun Alert: Yes
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: UltraSPARC III family of processors
PRODUCT CATEGORY: Server / SW Admin
PRODUCTS AFFECTED:
Systems Affected
------- --------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- A28 ALL Sun Blade 1000 -
- A29 ALL Sun Blade 2000 -
- A35 ALL Sun Fire 280R -
- A37 ALL Sun Fire V480 -
- A30 ALL Sun Fire V880 -
- S8 ALL Sun Fire 3800 -
- S12 ALL Sun Fire 4800 -
- S12i ALL Sun Fire 4810 -
- S24 ALL Sun Fire 6800 -
- F12K ALL Sun Fire 12K -
- F15K ALL Sun Fire 15K -
- N28 ALL Netra 20 -
X-options Affected
--------- --------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
X4007A - - ASSY CPU-4PROC USIIIP 900MHz -
X4525A - - ASSY MAXCPU 900MHz CNFIG F15K -
X4004A - - ASSY CPU-2PROC USIII 750MHz -
X4005A - - ASSY CPU-4PROC USIII 900MHz -
X4006A - - ASSY CPU-2PROC USIIIP 900MHz -
X4046A - - ASSY CPU DUAL 750MHz AL A30 -
X4047A - - ASSY CPU DUAL 750MHz AL A30 -
XCPUBD-4049 - - ASSY CPU-4GB/4PROC USIII 900+M -
XCPUBD-F4089 - - ASSY CPU-8GB/4PROC USIII 900+M -
XCPUBD-F4169 - - ASSY CPU-16GB/4PROC USIII 900+M -
XCPUBD-F4329 - - ASSY CPU-32GB/4PROC USIII 900+M -
XCPUBD-2029 - - ASSY CPU-2GB/2PROC USIII 900+M -
XCPUBD-2049 - - ASSY CPU-4GB/2PROC USIII 900+M -
XCPUBD-2089 - - ASSY CPU-8GB/2PROC USIII 900+M -
SF-XCPUBD-227 - - ASSY CPU-2GB/2PROC USIII 750MHz -
SF-XCPUBD-447 - - ASSY CPU-4GB/4PROC USIII 750MHz -
SF-XCPUBD-487 - - ASSY CPU-8GB/4PROC 512MB USIII -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
540-5052-02 or below ASSY CPU-4PROC USIIIP 900+ MHz -
540-4729-04 or below ASSY CPU-2PROC USIII 750MHz -
540-4730-04 or below ASSY CPU-4PROC USIII 750MHz -
540-5051-02 or below ASSY CPU-2PROC USIIIP 900+ MHz -
501-5818-06 or below ASSY CPU DUAL 750MHz AL A30 -
540-4934-03 or below ASSY CPU-4GB/4PROC USIII 900+ MHz -
540-4992-02 or below ASSY CPU-8GB/4PROC USIII 900+ MHz -
540-4990-03 or below ASSY CPU-16GB/4PROC USIII 900+ MHz -
540-4993-02 or below ASSY CPU-32GB/4PROC USIII 900+ MHz -
540-4984-02 or below ASSY CPU-2GB/2PROC USIII 900+ MHz -
REFERENCES:
BugId: 4466085 - Cheetah UCC error at TL>0 causes panic
4718366 - fast_ecc_err() uses 32bit load for 64bit address
values.
FIN: I0887-1
SunAlert: 45527
PROBLEM DESCRIPTION:
UltraSPARC III and III+ based platforms might panic on Correctable Fast
ECC (UCC) errors, instead of correcting them, resulting in loss of service of
the system.
This issue concerns a problem with software routines used in the detection
and correction of Fast ECC errors when they occur. See FIN I0887-1 for a
more detailed explanation for this type of error on UltraSPARC III and
III+ platforms.
This issue can occur in the following releases:
UltraSPARC III: (see Notes section)
Solaris 8
Solaris 9
UltraSPARC III+:
Solaris 8 without patch 108528
Solaris 9
Notes:
1. Releases prior to Solaris 8 are not implemented on UltraSPARC III
and III+ based systems and therefore are not affected.
2. UltraSPARC-III+ based systems running Solaris 9 are not impacted by
BugID 4718366. They are however impacted by BugID 4466085. UltraSPARC-III
based systems running Solaris 9 are impacted by both BugID 4466085 and
BugID 4718366.
3. Some UltraSPARC III based servers may still be susceptible to UCC errors
that cause system panics after the installation of 108528 due to the
introduction of Bug 4718366 in patch 108528. This applies to:
All Sun Fire V880 with 750 Mhz or 900 Mhz UltraSPARC III
processors and any of the following, if the value of
"ecache_scrub_flushaddr" is non-zero:
* Sun Blade 1000 with greater than 4Gb of memory and 750Mhz or
900 Mhz US-III processors.
* Sun Fire 280R with greater than 4Gb of memory and 750Mhz or
900 Mhz US-III processors.
* Sun Fire Servers (3800, 4800, 4810, 6800) with greater than
4Gb of memory and 750Mhz or 900 Mhz US-III processors.
* Netra 20 with greater than 4Gb of memory and 750 Mhz or 900 Mhz
US-III processors.
To determine if ecache_scrub_flushaddr is non-zero, run the following
sequence of commands as root on the system:
adb -k <<EOF
ecache_scrub_flushaddr/X
EOF
This value may be non-zero due to memory addressing, which can vary
over time, such as when swapping to disk.
To differentiate between US-III and US-III+ the prtconf(1M) command
will show the following:
# prtconf | grep -i ultrasparc
SUNW,UltraSPARC-III, instance #0
SUNW,UltraSPARC-III, instance #1
# prtconf | grep -i ultrasparc
SUNW,UltraSPARC-III+, instance #0
SUNW,UltraSPARC-III+, instance #1
The US-III+ is referred to as the "UltraSPARC-III Cu". (note the
"Cu")
The following table shows which platforms utilize which type of processor.
Platform UltraSPARC-III UltraSPARC-III+
----------- -------------- ---------------
Sun Blade 1000 X -
Sun Blade 2000 - X
Sun Fire 280R X X
Sun Fire V480 - X
Sun Fire V880 X X
Sun Fire 3800 X X
Sun Fire 4800 X X
Sun Fire 4810 X X
Sun Fire 6800 X X
Sun Fire 12K - X
Sun Fire 15K - X
Netra 20 X X
On Solaris 9, and Solaris 8 without 108528 software installed, UCC
errors resulting in a system panic appear similar to the following
abbreviated sample error message, in the /var/adm/messages log file:
Hostname: Server
Release SunOS 5.8 Generic_108528-12
System crashed at: 2001 Jun 28 10:02:44 GMT
WARNING: [AFT1] UCC Event on CPU1 in Privileged mode at TL>0,
errID 0x0000009e.c2733708
AFSR 0x00100400<PRIV,UCC>.00000015 AFAR 0x00000000.d5557700
Fault_PC 0x102047e8 Esynd 0x0015
[AFT1] errID 0x0000009e.c2733708 Data Bit 38 was in error and corrected
[AFT2] errID 0x0000009e.c2733708 PA=0x00000000.d5557700
E$tag 0x00000001.aa924924 E$state_4 Modified
[AFT2] E$Data (0x00) 0x00000000.00000002 0x00000000.564f4c57 ECC 0x022
[AFT2] E$Data (0x10) 0x00000000.ffbef8d8 0x00000000.00100003 ECC 0x01b
[AFT2] E$Data (0x20) 0x00000300.01031f28 0x000002a1.00557aec ECC 0x06f
[AFT2] E$Data (0x30) 0x000002a1.00556f81 0x00000000.102d05b8 ECC 0x083
[AFT2] D$tag 0x000d5557 D$state Valid D$utag 0x55 D$snp 0x000d5556
[AFT2] D$Data (0x00) 0x00000000.00000002 0x00000040.564f4c57
[AFT2] D$Data (0x10) 0x00000000.ffbef8d8 0x00000000.00100003
[AFT2] I$ data not available
WARNING: [AFT1] WDC Event on CPU1 at TL>0, errID 0x0000009e.c2733708
AFSR 0x00000040<WDC>.00000015 AFAR 0x00000000.d5557700
Fault_PC 0x102047e8 Esynd 0x0015
[AFT1] errID 0x0000009e.c2733708 Data Bit 38 was in error and corrected
panic[cpu1]/thread=30002b56ae0: [AFT1] errID 0x0000009e.c2733708 UCC WDC
Error(s)
On Solaris 8 Sun Fire V880 platforms, with 108528 installed, UCC
errors resulting in a system panic may appear similar to the following
abbreviated sample error messages, from the system console or
/var/adm/messages log file:
WARNING: [AFT1] Timeout (TO) Event detected by CPU2 in Privileged
mode at TL=0, errID 0x00000081.437ed5b4
AFSR 0x00201000<ME,TO>.00000000 AFAR 0x00000000.00fffe20
Fault_PC 0x30002cfd040
panic[cpu2]/thread=300025ece60: [AFT1] errID 0x00000081.437ed5b4
UCC TO Error(s)
On Solaris 8 UltraSPARC-III platforms other than the Sun Fire V880
platform, with 108528 installed, UCC errors resulting in a system
panic may appear similar to the following abbreviated sample error
messages, from the system console or /var/adm/messages log file:
panic: ptl1 trap reason 0x8
TL=0x1 TT=0x68 TICK=0x68155385e6b
TPC=0xff2c24cc TnPC=0xff2c24d0 TSTATE=0x4482001a00
TL=0x2 TT=0x70 TICK=0x68155385e6b
TPC=0x10000d20 TnPC=0x10000d24 TSTATE=0x4482041400
TL=0x3 TT=0x34 TICK=0x68155385e69
TPC=0x10144a98 TnPC=0x10144a90 TSTATE=0x82081400
panic[cpu0]/thread=3000d26cf20:
User panic at trap level 3
Note that the line beginning with TL=0x2 TT=0x70 identifies this "Fast
ECC" trap induced panic.
The root cause for this issue is related to the hardware architecture
of the UltraSPARC III family, particularly in how the "Fast ECC" trap
is handled by software at elevated Trap Levels (TL>0). TL>0 means a
trap occured within another trap handler.
Note: The cause of repeated UCC errors for a specific system could indicate
a more serious problem such as a defective hardware component. See
FIN I0887-1 for troubleshooting tips.
Fast ECC is a specific type of ECC error which typically occurs during
a load of Data Cache or Instruction Cache (D$/I$) from E-Cache (E$).
"Fast ECC", which is the Hardware Architecture name for the trap that
is taken, could perhaps be called "Software Correctable ECC", as the
hardware does not have sufficient time to supply corrected data to the
D$/I$ and software must intervene to make sure that corrupted data is
not used.
For UltraSPARC-III+ platforms, this issue has been addressed by Solaris
8 Patch 108528. For UltraSPARC-III platforms, this issue
has been addressed by Solaris 8 Patch 108528.
For UltraSPARC III and UltraSPARC III+ platforms with Solaris 9, this
issue has been addressed by Patch 112233.
IMPLEMENTATION:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.
It is recommended that all UltraSPARC III and III+ based systems be updated
with the following patches:
UltraSPARC III
--------------
Solaris 8 108528 or later
Solaris 9 112233 or later
UltraSPARC III+
---------------
Solaris 8 108528 or later
Solaris 9 112233 or later
COMMENTS:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
--------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.