SRDB ID   Synopsis   Date
48204   Sun Fire[TM] 12K/15K: Dstop: Data path command parity error detected by SDI(M)   31 Oct 2002

Status Issued

Description
- Problem Statement: 

Dstop: Data path command parity error detected by SDI(M).

- Symptoms:

redx wfail command output reports the following failure signature:

redxl> dumpf load dsmd.dstop.020507.2037.16
Created Tue May  7 20:37:18 2002
By hpost v. 1.2 Generic 112488-04 Mar 18 2002 14:43:00  executing as pid=4959
On ssc name =  rasputin-sc0.SD_RASCAL.West.Sun.COM
Domain =  0=A    Platform = rasputin
Boards in dump: master SC    CPs/CSBs[1:0]: 3
          EXB[17:0]: 12100
        Slot0[17:0]: 12100
        Slot1[17:0]: 12100
-D option, -d
"DSMD DomainStop Dump"
0 errors occurred while creating this dump.

redxl> wfail
SDI EX08/S0  Master_Stop_Status0[31:0] = B004000F
        MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
SDI EX08/S0  Dstop0[31:0] = 00098008
        Dstop0[16]: D    DARB texp requests all Dstop (M)
        Dstop0[19]: D 1E SDI internal core requested Dstop
SDI EX08/S0  Core_Error0[31:0]  = 00208020  Mask = 0051FFFF
        CoreErr0[21]: D 1E AXQ Data path command parity error (M)
            {dat_cmdp,dat_cmd[23:0]} = 0000001. {retired,half_used} = 3
            NOTE: Compare dat+par to AXQ out history to isolate 1-bit errors.
FAIL EXB EX8:  Dstop/Rstop detected by AXQ.
Primary service FRU is EXB EX8.
SDI EX13/S0: All SDI is DStopped and RStopped,         requested by DARB.
SDI EX16/S0: All SDI is DStopped and RStopped,         requested by DARB.
DARB C0: enabled ports (expanders)          [17:0]: 16100
DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100
DARB C1: enabled ports (expanders)          [17:0]: 16100
DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100

redxl> shsdi -e 8

Note: Data is displayed from the currently loaded dump file.

SDI EX08/S0    Component ID = 64317049
         Master_Stop_Status0[31:0] = B004000F
        MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
         Master_Stop_Status1[31:0] = E8E8000E
        0x08   CP1StopExp[4:0]        MSS1[20:16]    
           3   CP1StopSlot[0:1]       MSS1[22:21]    Dstop is 1st stop
           1   CP1StopInfoValid       MSS1[23]       
        0x08   CP0StopExp[4:0]        MSS1[28:24]    
           3   CP0StopSlot[0:1]       MSS1[30:29]    Dstop is 1st stop
           1   CP0StopInfoValid       MSS1[31]       
         Dstop0[31:0] = 00098008
        Dstop0[16]: D    DARB texp requests all Dstop (M)
        Dstop0[19]: D 1E SDI internal core requested Dstop
         Dstop1[31:0] = 00000000
         Recordstop0[31:0]  = 00018001
        Rstop0[16]: R 1E DARB texp request Recordstop (M)
         Recordstop1[31:0]  = 00000000
         Core_Error0[31:0]  = 00208020  Mask = 0051FFFF
        CoreErr0[21]: D 1E AXQ Data path command parity error (M)
            {dat_cmdp,dat_cmd[23:0]} = 0000001. {retired,half_used} = 3
            NOTE: Compare dat+par to AXQ out history to isolate 1-bit errors.
            Core_ErrData[4:2][31:0]  = 00000000 00080700 00000060
            Core_ErrData[1:0][31:0]  = 00000007 00001001
         Core_Error1[31:0]  = 00000000  Mask = FFFFFFFF
         Sysreg_Error[31:0] = 00000000  Mask = 780377FF
         STB_Error[31:0]    = 00000000  Mask = 7F00FFFF
         CP_Error0[31:0]    = 00000000  Mask = 580067FF
         CP_Error1[31:0]    = 00000000  Mask = 7FFCFFFF
         Slot0_Error0[31:0] = 00000000  Mask = 7000FFFF
         Slot0_Error1[31:0] = 00000000  Mask = 31444EBF
         Slot0_Error2[31:0] = 00000000  Mask = 7FFCFFFF
         Slot1_Error0[31:0] = 00000000  Mask = 3000FFFF
         Slot1_Error1[31:0] = 00000000  Mask = 31404EBF
         Slot1_Error2[31:0] = 00000000  Mask = 7FFCFFFF
         
redxl> shaxq 8 h

Note: Data is displayed from the currently loaded dump file.

AXQ EX08   Ecc-compressed output history[6:0] to AMX, RMX, and SDI.
<---- AMX ---->     RMX      SDI DpCmd  Sysreg
1.1 1.0 0.1 0.0    0   1      OE Ecc    OE Ecc     entry
15  15  15  15    05  05       1 68      0 49      0  old
15  15  15  15    05  05       1 68      0 49      1
15  15  15  15    05  05       1 68      0 49      2
15  15  15  15    05  05       1 68      0 49      3
15  15  15  15    05  05       1 68      0 49      4
15  15  15  15    05  05       1 68      0 49      5
15  15  15  15    05  05       1 68      0 49      6
15  15  15  15    05  05       1 68      0 49      7
15  15  15  15    05  05       1 68      0 49      8
15  15  15  15    05  05       1 68      0 49      9
15  15  15  15    05  05       1 68      0 49     10
15  15  15  15<   05  05<      1 68      0 49     11
15  15  15  15    05  05       1 68      0 49     12
15  15  15  15    05  05       1 68      0 49     13
15  15  15  15    05  05       1 68      0 49     14
15  15  15  15    05  05       1 68      0 49     15
15  15  15  15    05  05       1 68      0 49     16
15  15  15  15    05  05       1 68      0 49     17
15  15  15  15    05  05       1 68      0 49     18
15  15  15  15    05  05       1 68      0 49     19
15  15  15  15    05  05       1 68      0 49     20
15  15  15  15    05  05       1 68      0 49     21
15  15  15  15    05  05       1 68      0 49     22
15  15  15  15    05  05       1 68      0 49     23
15  15  15  15    05  05       1 68      0 49     24
15  15  15  15    05  05       1 68      0 49     25
15  15  15  15    05  05       1 68<     0 49     26
15  15  15  15    05  05       1 68      0 49     27
15  15  15  15    05  05       1 68      0 49     28
15  15  15  15    05  05       1 68      0 49     29
15  15  15  15    05  05       1 68      0 49     30
15  15  15  15    05  05       1 68      0 49     31  new

NOTE: If a parity error was detected by a receiving AMX, RMX, or SDI, the ecc history
entry indicated by '<' in this display can be compared to the receiver's data capture
to isolate 1-bit errors. Use the command "parse axqoh" to do this analysis.

This assumes only a single error exists in the system; multiple
errors can delay recordstop, causing the history of interest to be in an indeterminate
older entry in the output history.

redxl> parse axqoh d x0000001 x68 
SDI Dpath cmd capture[24:0] = 0000001. Computed ecc = 47.  AXQ hist ecc = 68.
Could be a 1-bit error in bit 0 (as used to compute AXQ oh ecc).


            

SOLUTION SUMMARY:
- Troubleshooting:

It is evident from dump header that this Dstop dumpfile was generated by dsmd while the domain was running. This is also evident by the dump file name - dsmd.dstop files are created by dsmd as part of an ASR.


Note the following first two errors (1E) on the two different error registers: 

Dstop0   - SDI internal core requested Dstop 
CoreErr0 - AXQ Data path command parity error (M)

Note FAIL EXB EX8. This would be what POST would choose to deconfigure in order to recover the domain with maximal fault-free domain given the fault implied by this error during the POST run. 

Note the recommendation to the FRU(s) to be replaced in order to remove the fault:

Primary service FRU is EXB EX8. 

AXQ sends data commands and domain/record stop information to
the SDI(M) over the 24 bit unidirectional data path command
interface data_cmd_l. One command can be sent per cycle. Parity
(even) is provided concurrently with the transfer and allows a
quiescent state of all highs on the active low bus.

Using redx on AXQ out history to isolate 1-bit errors:

redxl> parse axqoh d x0000001 x68, we have a possible 1-bit error in bit 0. 


- Resolution:

This Data Path Command signal is on the Expander Board from AXQ to SDI(M).  The service FRU is EXB 8.

- Summary of part number and patch ID's 

http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html 

- References and bug IDs

Specification for an ASIC - SDI.


- Additional background information:

- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

15K, 12K, SF15K, SF12K, starcat, dstop, AXQ, SDI(M), axqoh, Data path Command parity error            

INTERNAL SUMMARY:

SUBMITTER: Tong-Pheng Koh APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.