SRDB ID   Synopsis   Date
48491   Sun Fire[TM] 12K/15K: Dstop: CP0 demand bus parity error   1 Nov 2002

Status Issued

Description
- Problem Statement:

    Dstop: CP[01] demand bus parity error

- Symptoms:

   'wfail' output reports something similar to the following:

       01  redxl> dumpf load dsmd.dstop.020506.2128.46
       02  Created Mon May  6 21:28:47 2002
       03  By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50  executing as pid=6862
       04  On ssc name =  rasputin-sc0.SD_RASCAL.West.Sun.COM
       05  Domain =  0=A    Platform = rasputin
       06  Boards in dump: master SC    CPs/CSBs[1:0]: 3
       07            EXB[17:0]: 12100
       08          Slot0[17:0]: 12100
       09          Slot1[17:0]: 12100
       10  -D option, -d
       11  "DSMD DomainStop Dump"
       12  0 errors occurred while creating this dump.
       13  redxl> wfail 
       14  SDI EX08/S0  Master_Stop_Status0[31:0] = E004000F
       15          MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
       16  SDI EX08/S0  Dstop0[31:0] = 00418040
       17          Dstop0[16]: D    DARB texp requests all Dstop (M)
       18          Dstop0[22]: D 1E SDI internal CP port requested Dstop
       19  SDI EX08/S0  CP_Error0[31:0]    = 2004A004  Mask = 580067FF
       20          CPErr0[18]: D 1E CP0 demand bus parity error (M)
       21              cp0_{dembusp,texp,unload,demand[1:0]} = 01
       22          CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M)
       23              cp0_{dembusp,texp,unload,demand[1:0]} = 01
       24              cp1_{dembusp,texp,unload,demand[1:0]} = 00
       25  FAIL EXB EX8:  Dstop/Rstop detected by SDI EX8/S0.
       26  Primary service FRU is EXB EX8.
       27  FAIL EXB EX8 with CP C0:  Dstop/Rstop detected by SDI.
       28  Primary service FRU is EXB EX8.
       29  Secondary service FRU is CSB C0 or the logic centerplane.
       30  SDI EX13/S0: All SDI is DStopped and RStopped,         requested by DARB.
       31  SDI EX16/S0: All SDI is DStopped and RStopped,         requested by DARB.
       32  DARB C0: enabled ports (expanders)          [17:0]: 16100
       33  DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100
       34  DARB C1: enabled ports (expanders)          [17:0]: 16100
       35  DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100

         

SOLUTION SUMMARY:
- Troubleshooting:

    The dump header tells us that this Dstop was generated by dsmd (lines 10,11) 
    while a domain was active. This is also evident by the dumpf file name - 
    dsmd.dstop files are created by dsmd as part of an ASR. Walking the
    error chain:

     - Master SDI on EX8 calls for Dstop as directed by itself (line 18)
     - Master SDI on EX8 reports errors in the CPErr0 register (lines 20,22)
     - EX8 is FAILed from the configuration and named as a primary FRU (lines 25,26)
     - EX8's low centerplane half is FAILed from the configuration (line 27)
     - EX8 and CS0/CP are named as primary and secondary FRUs (lines 28,29)

    Each DARB sources a parity protected demand signal to an expander's Master 
    SDI. The demand tells the SDI to expect data to arrive four cycles later 
    (4 and 5 cycles later if the centerplane is degraded). In the 'wfail' output,
    the demand signals are shown (lines 23, 24). The low two bits comprise the 
    demand.

       00 = target is slot 0                    [cp1 above]
       01 = target is slot 1                    [cp0 above]
       10 = not used
       11 = idle state (no demand event in progress) 

    In this example, DARB0 indicated slot 1 as the target (line 23) while DARB1
    indicated slot 0 as the target (line 24). The demand signal from DARB0 had
    a parity error (line 20) thus accounting for a bit flip in bit 0. This is
    also why 'wfail' chooses to fail centerplane half 0 from the configuration.

    Also, since the DARBs disagree, the SDI sees this as a loss of lockstep
    in the centerplane. Therefore, the CP arbiter lockstep error (line 22) is 
    recorded. This error is a result of the parity error.

        

- Resolution:

    Repair/replace EX8. 

    If errors persist, investigate issues with CS0 as it drives the low half of 
    the centerplane. If CS0 has no fault history, repair/replace the centerplane.       

- Summary of part number and patch ID's 

    http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html
               
- References and bug IDs

    SunSolve Article 48122 
    SunSolve Article 48223
    DARB ASIC Specification
        

- Additional background information:

    By using the capture information in the SDI, the specific bit in error in the 
    demand signal can be determined. Another example of a demand bus parity error:

       36  SDI EX08/S0  CP_Error0[31:0]    = 2004A004  Mask = 580067FF
       37          CPErr0[18]: D 1E CP0 demand bus parity error (M)
       38              cp0_{dembusp,texp,unload,demand[1:0]} = 04
       39          CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M)
       40              cp0_{dembusp,texp,unload,demand[1:0]} = 04
       41              cp1_{dembusp,texp,unload,demand[1:0]} = 00

    Here, bit 2 differs, indicating a parity error on the unload signal from DARB0.
    The unload signal is a unidirectional signal sent from the DARB to the Master SDI.
    During operation, the SDI keeps track of the DARB input buffer fullness. The 
    unload signal asserted by the DARB is an indicator to the SDI that the DARB has
    unloaded a prior request, thus freeing up a buffer slot.
    
    This does not change the diagnosis listed earlier.
        
        
- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K,
starcat, dstop, demand bus parity error

         

INTERNAL SUMMARY:

SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.