Document fins/I0640-1


FIN #: I0640-1

SYNOPSIS: DTAG Error Guideline on E10K

DATE: Apr/19/01

KEYWORDS: DTAG Error Guideline on E10K


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)


SYNOPSIS: E10000 systems experiencing excessive DTAG errors should be
          reported to HES Engineering for problem resolution.

TOP FIN/FCO REPORT: No 
 
PRODUCT_REFERENCE:  E10000 DTAG Parity Error  

PRODUCT CATEGORY:   Server / Service  


PRODUCTS AFFECTED:  

Mkt_ID   Platform   Model   Description           Serial Number
------   --------   -----   -----------           -------------

Systems Affected
----------------
  -      E10000      ALL    Ultra Enterprise 10000      -

X-Options Affected
------------------
  -        -        -                 -                 -             


PART NUMBERS AFFECTED: 

Part Number              Description                    Model
-----------              -----------                    -----
501-4786-03 or lower     ECB ASSY E10000 SYSTEM           -
501-4347-51 or lower     ECB ASSY E10000 SYSTEM           -
500-4903-02 or lower     ECB ASSY E10000 SYSTEM           -
501-5240-03 or lower     ECB ASSY E10000 SYSTEM SF+       - 
501-5693-01              ECB ASSY E10000 SYSTEM SF+       -


REFERENCES:

URL:  http://bestpractices.central
      http://hes.west/esg/hsg/starfire/xftt/hw_cic2.html

FIN:  I0570-3
      I0587-1
      I0616-1
      I0623-1

      
PROBLEM DESCRIPTION:

There has been concern from HES Engineering that some customer sites
may be experiencing an excessive number of DTAG-related system board
failures on E10000 systems and that the proper Corrective Action is 
not being implemented.

The issue is that each DTAG error gets identified and fixed, but nobody
knows what to do in the event that a customer has had multiple errors
of this kind.  Experiencing these errors once or twice is normal, but
three or more is not.  CTE is trying to ensure that the field knows
what to do in the event of these multiple errors.  In most cases the
absence of the proper Corrective Action goes unnoticed until further
failures occur and the customer situation blows up.  By providing the
field with a path to raise the issue, encountering this situation can
be avoided.
  
DTAG problems may affect the following E10000 system boards:

     Starfire system board, for procs < 400MHz.
     ------------------------------------------
        501-4347-51 or lower 
        501-4786-03 or lower  
        501-4903-02 or lower

     Starfire+ system board, all processor speeds. 
     ---------------------------------------------
        501-5240-03 or lower  
        501-5693-01   
         
DTAG stands for duplicate tag. These tags are duplicates of the CPU's
tags which are used for system addressing. A DTAG Parity Error occurs
when the data received back from this set of duplicate tags contains
corrupted data.  This error is one of several possible errors which
cause a platform interruption.

  CORRECTABLE ERROR - RECORDSTOP
  =================================================================
  Created Tue Feb 29 15:55:57 2000
  By hpost v. 3.3 Feb 16 2000 22:54:51  executing as pid=21798
  On ssp name =  thing2-ssp.SD_Lab.West.Sun.COM
  HOSTNAME =  thing2-fib2
  platform_name =  thing2
  Boardmask = 30080    -D option
  Edd-Record-Stop-Dump
  There were 0 errors encountered while creating this dump.
  redxlwfail
  LAARB 7     ErrorCLAARB 7     ErrorCSR1[65:0] = 0 00000000 30000002
          ErrCSR1[1]: Recordstop Detected
          ErrCSR1[28]: GAARB 2 Requests Recordstop (LAARB)
          ErrCSR1[29]: GAARB 3 Requests Recordstop (LAARB)
  LAARB 7     ErrorCSR3[63:0]: Hist: 0 N 0000    Flgs = 000 00040000
          ErrCSR3[18]: Recordstop Requested by CIC2 (LAARB)
  CIC   7.2   ErrFlags[61:0] = 00000001 00000002   (after mask)
          ErrFlag[1]: Correctable ECC Error   (CE)  Processor 1 Dtags
          ErrFlag[32]: Repeated Error
      Proc 1 Dtag ECCSyn[13: 8] = 23:  CE: bit 00  Dtag SRAM 7.2.0
   FAIL Proc 7.1 in all  configs using CIC2: : Arbstop/Recordstop detected by
   cic
          (*** NOTE: Implicated FRU is sysboard 7)
  GAARB 2     ErrorCSR1[65:0] = 0 00000000 00000002
          ErrCSR1[1]: Recordstop Detected
  GAARB 2     ArbStopLog[15:0] = 0000   RecordStopLog[15:0] = 0083
  GAARB 3     ErrorCSR1[65:0] = 0 00000000 00000002
          ErrCSR1[1]: Recordstop Detected
  GAARB 3     ArbStopLog[15:0] = 0000   RecordStopLog[15:0] = 0083
  redxl


  UNCORRECTABLE ERROR - ARBSTOP
  =================================================================
 
  By hpost v. 3.3_cic2_fly Feb 16 2000 22:54:51  executing as pid=25690
  On ssp name =  thing2-ssp.SD_Lab.West.Sun.COM
  HOSTNAME =  thing2-fib2
  platform_name =  thing2
  Boardmask = 30080    -D option
  Edd-Arbstop-Dump
  There were 0 errors encountered while creating this dump.
  redxlwfail
  LAARB 7     ErrorCSR1[65:0] = 0 00000000 33000003
          ErrCSR1[1:0]: Arbstop + Recordstop detected
          ErrCSR1[24]: GAARB 2 Requests Arbstop (LAARB)
          ErrCSR1[25]: GAARB 3 Requests Arbstop (LAARB)
          ErrCSR1[28]: GAARB 2 Requests Recordstop (LAARB)
          ErrCSR1[29]: GAARB 3 Requests Recordstop (LAARB)
  LAARB 7     ErrorCSR3[63:0]: Hist: 0 N 0000    Flgs = 000 00040040
          ErrCSR3[6]: Arbstop Requested by CIC2 (LAARB)
          ErrCSR3[18]: Recordstop Requested by CIC2 (LAARB)
  CIC   7.2   ErrFlags[61:0] = 00000001 00000011   (after mask)
          ErrFlag[4]: Uncorrectable ECC Error (UE)  Processor 0 Dtags
          ErrFlag[32]: Repeated Error
  FAIL Proc 7.0 in all  configs using CIC2: : Arbstop detected by cic
          (*** NOTE: Implicated FRU is sysboard 7)
  CIC   7.2   ErrFlags[61:0] = 00000001 00000011   (after mask)
          ErrFlag[0]: Correctable ECC Error   (CE)  Processor 0 Dtags
          ErrFlag[32]: Repeated Error
      Proc 0 Dtag ECCSyn[ 5: 0] = 07:  CE: bit 03  Dtag SRAM 7.2.0
  GAARB 2     ErrorCSR1[65:0] = 0 00000000 00000003
          ErrCSR1[1:0]: Arbstop + Recordstop detected
  GAARB 2     ArbStopLog[15:0] = 0080   RecordStopLog[15:0] = 0083
  GAARB 3     ErrorCSR1[65:0] = 0 00000000 00000003
          ErrCSR1[1:0]: Arbstop + Recordstop detected
  GAARB 3     ArbStopLog[15:0] = 0080   RecordStopLog[15:0] = 0083


For more information on DTAG errors, see the following Starfire 
Engineering web site:

     http://hes.west/esg/hsg/starfire/xftt/hw_cic2.html

In order to assist local service and account teams with analyzing and
resolving potential DTAG issues, CTE is requesting that they be
contacted if there appear to be an excessive number of DTAG-related
system board failures on a given system.  An excessive number of 
DTAG failures would be 3 over the lifetime of the system, with one
occurring during the last six months.


IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION:    

Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problems by following the recommendations as
shown below;

HES-CTE maintains an email alias, dtag-info@west.sun.com.  Please
contact this alias if you believe you are seeing excessive DTAG
failures.  From there, CTE will assist you in the analysis process, as
well as determining the proper path for timely resolution of the
problem.


COMMENTS:  

-----------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.

----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files 
for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM

--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM







Copyright (c) 1997-2003 Sun Microsystems, Inc.