Document fins/I0640-1
FIN #: I0640-1
SYNOPSIS: DTAG Error Guideline on E10K
DATE: Apr/19/01
KEYWORDS: DTAG Error Guideline on E10K
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: E10000 systems experiencing excessive DTAG errors should be
reported to HES Engineering for problem resolution.
TOP FIN/FCO REPORT: No
PRODUCT_REFERENCE: E10000 DTAG Parity Error
PRODUCT CATEGORY: Server / Service
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- E10000 ALL Ultra Enterprise 10000 -
X-Options Affected
------------------
- - - - -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
501-4786-03 or lower ECB ASSY E10000 SYSTEM -
501-4347-51 or lower ECB ASSY E10000 SYSTEM -
500-4903-02 or lower ECB ASSY E10000 SYSTEM -
501-5240-03 or lower ECB ASSY E10000 SYSTEM SF+ -
501-5693-01 ECB ASSY E10000 SYSTEM SF+ -
REFERENCES:
URL: http://bestpractices.central
http://hes.west/esg/hsg/starfire/xftt/hw_cic2.html
FIN: I0570-3
I0587-1
I0616-1
I0623-1
PROBLEM DESCRIPTION:
There has been concern from HES Engineering that some customer sites
may be experiencing an excessive number of DTAG-related system board
failures on E10000 systems and that the proper Corrective Action is
not being implemented.
The issue is that each DTAG error gets identified and fixed, but nobody
knows what to do in the event that a customer has had multiple errors
of this kind. Experiencing these errors once or twice is normal, but
three or more is not. CTE is trying to ensure that the field knows
what to do in the event of these multiple errors. In most cases the
absence of the proper Corrective Action goes unnoticed until further
failures occur and the customer situation blows up. By providing the
field with a path to raise the issue, encountering this situation can
be avoided.
DTAG problems may affect the following E10000 system boards:
Starfire system board, for procs < 400MHz.
------------------------------------------
501-4347-51 or lower
501-4786-03 or lower
501-4903-02 or lower
Starfire+ system board, all processor speeds.
---------------------------------------------
501-5240-03 or lower
501-5693-01
DTAG stands for duplicate tag. These tags are duplicates of the CPU's
tags which are used for system addressing. A DTAG Parity Error occurs
when the data received back from this set of duplicate tags contains
corrupted data. This error is one of several possible errors which
cause a platform interruption.
CORRECTABLE ERROR - RECORDSTOP
=================================================================
Created Tue Feb 29 15:55:57 2000
By hpost v. 3.3 Feb 16 2000 22:54:51 executing as pid=21798
On ssp name = thing2-ssp.SD_Lab.West.Sun.COM
HOSTNAME = thing2-fib2
platform_name = thing2
Boardmask = 30080 -D option
Edd-Record-Stop-Dump
There were 0 errors encountered while creating this dump.
redxlwfail
LAARB 7 ErrorCLAARB 7 ErrorCSR1[65:0] = 0 00000000 30000002
ErrCSR1[1]: Recordstop Detected
ErrCSR1[28]: GAARB 2 Requests Recordstop (LAARB)
ErrCSR1[29]: GAARB 3 Requests Recordstop (LAARB)
LAARB 7 ErrorCSR3[63:0]: Hist: 0 N 0000 Flgs = 000 00040000
ErrCSR3[18]: Recordstop Requested by CIC2 (LAARB)
CIC 7.2 ErrFlags[61:0] = 00000001 00000002 (after mask)
ErrFlag[1]: Correctable ECC Error (CE) Processor 1 Dtags
ErrFlag[32]: Repeated Error
Proc 1 Dtag ECCSyn[13: 8] = 23: CE: bit 00 Dtag SRAM 7.2.0
FAIL Proc 7.1 in all configs using CIC2: : Arbstop/Recordstop detected by
cic
(*** NOTE: Implicated FRU is sysboard 7)
GAARB 2 ErrorCSR1[65:0] = 0 00000000 00000002
ErrCSR1[1]: Recordstop Detected
GAARB 2 ArbStopLog[15:0] = 0000 RecordStopLog[15:0] = 0083
GAARB 3 ErrorCSR1[65:0] = 0 00000000 00000002
ErrCSR1[1]: Recordstop Detected
GAARB 3 ArbStopLog[15:0] = 0000 RecordStopLog[15:0] = 0083
redxl
UNCORRECTABLE ERROR - ARBSTOP
=================================================================
By hpost v. 3.3_cic2_fly Feb 16 2000 22:54:51 executing as pid=25690
On ssp name = thing2-ssp.SD_Lab.West.Sun.COM
HOSTNAME = thing2-fib2
platform_name = thing2
Boardmask = 30080 -D option
Edd-Arbstop-Dump
There were 0 errors encountered while creating this dump.
redxlwfail
LAARB 7 ErrorCSR1[65:0] = 0 00000000 33000003
ErrCSR1[1:0]: Arbstop + Recordstop detected
ErrCSR1[24]: GAARB 2 Requests Arbstop (LAARB)
ErrCSR1[25]: GAARB 3 Requests Arbstop (LAARB)
ErrCSR1[28]: GAARB 2 Requests Recordstop (LAARB)
ErrCSR1[29]: GAARB 3 Requests Recordstop (LAARB)
LAARB 7 ErrorCSR3[63:0]: Hist: 0 N 0000 Flgs = 000 00040040
ErrCSR3[6]: Arbstop Requested by CIC2 (LAARB)
ErrCSR3[18]: Recordstop Requested by CIC2 (LAARB)
CIC 7.2 ErrFlags[61:0] = 00000001 00000011 (after mask)
ErrFlag[4]: Uncorrectable ECC Error (UE) Processor 0 Dtags
ErrFlag[32]: Repeated Error
FAIL Proc 7.0 in all configs using CIC2: : Arbstop detected by cic
(*** NOTE: Implicated FRU is sysboard 7)
CIC 7.2 ErrFlags[61:0] = 00000001 00000011 (after mask)
ErrFlag[0]: Correctable ECC Error (CE) Processor 0 Dtags
ErrFlag[32]: Repeated Error
Proc 0 Dtag ECCSyn[ 5: 0] = 07: CE: bit 03 Dtag SRAM 7.2.0
GAARB 2 ErrorCSR1[65:0] = 0 00000000 00000003
ErrCSR1[1:0]: Arbstop + Recordstop detected
GAARB 2 ArbStopLog[15:0] = 0080 RecordStopLog[15:0] = 0083
GAARB 3 ErrorCSR1[65:0] = 0 00000000 00000003
ErrCSR1[1:0]: Arbstop + Recordstop detected
GAARB 3 ArbStopLog[15:0] = 0080 RecordStopLog[15:0] = 0083
For more information on DTAG errors, see the following Starfire
Engineering web site:
http://hes.west/esg/hsg/starfire/xftt/hw_cic2.html
In order to assist local service and account teams with analyzing and
resolving potential DTAG issues, CTE is requesting that they be
contacted if there appear to be an excessive number of DTAG-related
system board failures on a given system. An excessive number of
DTAG failures would be 3 over the lifetime of the system, with one
occurring during the last six months.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
CORRECTIVE ACTION:
Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problems by following the recommendations as
shown below;
HES-CTE maintains an email alias, dtag-info@west.sun.com. Please
contact this alias if you believe you are seeing excessive DTAG
failures. From there, CTE will assist you in the analysis process, as
well as determining the proper path for timely resolution of the
problem.
COMMENTS:
-----------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files
for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
Copyright (c) 1997-2003 Sun Microsystems, Inc.