SRDB ID   Synopsis   Date
48490   Sun Fire[TM] 12K/15K: Rstop: CDC1 correctable error   1 Nov 2002

Status Issued

Description
- Problem Statement:

    Rstop: CDC1 correctable error

- Symptoms:

    'wfail' output reports something similar to the following:

       01  redxl> dumpf load dsmd.rstop.020626.2116.11
       02  Created Wed Jun 26 21:16:13 2002
       03  By hpost v. 1.2 Generic 112488-05 May  8 2002 17:05:18  executing as pid=2224
       04  On ssc name =  orlocn01-sc0.
       05  Domain =  0=A = orlsxdp01    Platform = orlocn01
       06  Boards in dump: master SC    CPs/CSBs[1:0]: 3
       07            EXB[17:0]: 0007F
       08          Slot0[17:0]: 0007F
       09          Slot1[17:0]: 00003
       10  -D option, -d
       11  "DSMD RecordStop Dump"
       12  0 errors occurred while creating this dump.
       13  redxl> wfail
       14  SDI EX00/S0: SDI is RStopped, requested by DARB.
       15  SDI EX01/S0: SDI is RStopped, requested by DARB.
       16  SDI EX02/S0: SDI is RStopped, requested by DARB.
       17  SDI EX03/S0: SDI is RStopped, requested by DARB.
       18  SDI EX04/S0  Master_Stop_Status0[31:0] = C0040008
       19          MStop0[3]: SDI is Recordstopped
       20  SDI EX04/S0  Recordstop0[31:0]  = 00818080
       21          Rstop0[16]: R    DARB texp request Recordstop (M)
       22          Rstop0[23]: R 1E AXQ requests all Recordstop (M)
       23  AXQ EX04 ( 4) Error_Flag_07[31:0] = 00088008  Mask = 63FF7D24
       24          Err7[19]: R 1E CDC1 correctable error
       25  FAIL CDC Dimm EX4:  Dstop/Rstop detected by AXQ.
       26  Primary service FRU is EXB EX4.
       27  SDI EX05/S0: SDI is RStopped, requested by DARB.
       28  SDI EX06/S0: SDI is RStopped, requested by DARB.
       29  DARB C0: enabled ports (expanders)          [17:0]: 3FC7F
       30  DARB C0: exps request Rstop                 [17:0]: 00010
       31  DARB C0: other darb req Rstop for exps      [17:0]: 00010
       32  DARB C1: enabled ports (expanders)          [17:0]: 3FC7F
       33  DARB C1: exps request Rstop                 [17:0]: 00010
       34  DARB C1: other darb req Rstop for exps      [17:0]: 00010

   
      
SOLUTION SUMMARY:
- Troubleshooting:

    The dump header tells us that this error was encountered while a domain
    was active (lines 10,11). This is also evident by the dump file name - 
    This is also evident by the dumpf file name - dsmd.dstop files are created 
    by dsmd as part of an ASR. Walking the error chain:

     - SDI4 reports a first error of AXQ4 calling for Rstop (line 22). 
     - AXQ4 reports a correctable error in CDC1 (line 24).
     - The CDC DIMM is FAILed from the configuration (line 25).
     - EX4 is called out as the FRU (line 26).

    The CDC DIMM is divided into 3 SRAMs, read in parallel, forming a 3-way
    set associative cache. CDC entries contain information about lines of
    memory recently referenced by SSM logic.

    Any error (correctable or uncorrectable) in the CDC is recorded and
    logged, but never causes a Dstop. Entries with correctable errors are
    written back with the corrected data. Uncorrectable errors are treated
    as cache misses.
    

- Resolution:

    Since the error is correctable, frequency must be considered. If the
    error is an isolated incident, no action is required. Single bit errors
    will happen, and that's what ECC protection is designed for.

    If, however, a series of correctable errors occur on the same CDC,
    the expander should be replaced (the CDC DIMM is not a FRU). For
    quantifying "too many failures" standard rules for soft errors apply.
        

- Summary of part number and patch ID's

    http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html

- References and bug IDs

    SunSolve Article 48122 
    Document 816-5053-10       

- Additional background information:

    None        
        
- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15k
Category:

- Keywords

15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K,
starcat, rstop, CDC1 correctable error           

INTERNAL SUMMARY:

SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.