SRDB ID   Synopsis   Date
48203   Sun Fire[TM] 12K/15K: Dstop: CP0_GDTransID data error detected by SDI(M)   31 Oct 2002

Status Issued

Description
- Problem Statement: 

	Dstop: CP0_GDTransID data error detected by SDI(M).

- Symptoms:

redx wfail command output reports the following failure signature:

redxl> dumpf load dsmd.dstop.020506.1859.02
Created Mon May  6 18:59:04 2002
By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50  executing as pid=7984
On ssc name =  rasputin-sc0.SD_RASCAL.West.Sun.COM
Domain =  0=A    Platform = rasputin
Boards in dump: master SC    CPs/CSBs[1:0]: 3
          EXB[17:0]: 12100
        Slot0[17:0]: 12100
        Slot1[17:0]: 12100
-D option, -d
"DSMD DomainStop Dump"
0 errors occurred while creating this dump.

redxl> wfail
SDI EX08/S0  Master_Stop_Status0[31:0] = 0004000F
        MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
SDI EX08/S0  Dstop0[31:0] = 00418040
        Dstop0[16]: D    DARB texp requests all Dstop (M)
        Dstop0[22]: D 1E SDI internal CP port requested Dstop
SDI EX08/S0  CP_Error0[31:0]    = 02008200  Mask = 580067FF
        CPErr0[25]: D 1E CP0 half GDTransid parity error (M)
            {cp0_gdidp,cp0_gdid[5:0]} = 01
FAIL EXB EX8 with CP C0:  Dstop/Rstop detected by SDI.
Primary service FRU is EXB EX8.
Secondary service FRU is CSB C0 or the logic centerplane.
SDI EX13/S0: All SDI is DStopped and RStopped,         requested by DARB.
SDI EX16/S0: All SDI is DStopped and RStopped,         requested by DARB.
DARB C0: enabled ports (expanders)          [17:0]: 16100
DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100
DARB C1: enabled ports (expanders)          [17:0]: 16100
DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100

redxl> shdarb -e 0 8

Note: Data is displayed from the currently loaded dump file.

DARB C0 (0)  Component ID = 44303049
      Port  8 InterAsicStatus[31:0] = 80200009
        IAStat[21,31]: Other DARB requests Dstop+Rstop for this exp
        IAStat[ 3]: EXB requests Domainstop, EXB internal reason
      Port  8 PortStatus[13:0] = 3000
        PStat[12,13]: Port Dstop+Rstop: Another port or asic detected error
        
redxl> shdarb -e 1 8

Note: Data is displayed from the currently loaded dump file.

DARB C1 (1)  Component ID = 44303049
      Port  8 InterAsicStatus[31:0] = 80200009
        IAStat[21,31]: Other DARB requests Dstop+Rstop for this exp
        IAStat[ 3]: EXB requests Domainstop, EXB internal reason
      Port  8 PortStatus[13:0] = 3000
        PStat[12,13]: Port Dstop+Rstop: Another port or asic detected error
        
redxl> wfail -B
exp_abus        EX8/AB0                 # redx wfail of dump 020507.0959.04

            

SOLUTION SUMMARY:
- Troubleshooting:


From the dump header you can see that this Dstop dumpfile was generated by dsmd while
the domain was running. This is also evident by the dump file name - dsmd.dstop files
are created by dsmd as part of an ASR.


Note the following first two errors (1E) on the two different error registers: 

Dstop0 - SDI internal CP port requested Dstop
CPErr0 - CP0 half GDTransid parity error (M)

Note FAIL EXB EX8 with CP C0. This would be what POST would choose to deconfigure in
order to recover the domain with maximal fault-free domain given the fault implied by
this error during the POST run. 

Note the recommendation to the FRU(s) to be replaced in order 
to remove the fault: 

Primary service FRU is EXB EX8.
Secondary service FRU is CSB C0 or the logic centerplane.

cp0_gdtransid_l[5:0] and cp1_gdtransid_l[5:0] are bidirectional
identifiers passed through the DARB from SDI to SDI. It is used
to convey SDI STB information or device and tag information for
the associated data transfer that follows 2 cycles later. For
DARB to SDI outbound, it is always preceded by a demand 2
cycles previous. For SDI to DARB inbound, it is always preceded
by a TEXP at least 2 cycles previous. Bit sliced 6 bits per
DARB, total length is 12bit wide:

     11      10:9    8:6        5       4       3:0
P1 abort_l dstat_l device_l ld_stb_l dtarg_l data_tag_l P0
<-------------------------> <---------------------------->
      CP1 arbiter slice           CP0 arbiter slice

cp0_gdtransid_par_l(P0) and cp1_gdtransid_par_l(P1) are the
parity on bidirectional gdtransid slices, one for each arbiter
slice. Transferred concurrent (same cycle) as gdtransid_l. On
the bus, gdtransid_par_l = 1 if gdtransid_l all = 1.

{cp0_gdidp,cp0_gdid[5:0]} = 01 = 00000001, so we have data error on
the CP0_GDTransID -> CP 0 half GDTransid parity error.

Since this signal crosses component boundaries (i.e., Expander 8
and Centerplane (DARB 0)), the possible service FRUs are the EXB 
8 and Centerplane. Wfail calls out the primary service FRU 
as EXB 8 and the secondary service FRU as CSB C0 or the logic centerplane.

- Resolution:

The first FRU to be swapped out should be EX8. If the problem/
error persists the Centerplane should be suspect.

- Summary of part number and patch ID's 

http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html 

- References and bug IDs

Specification for an ASIC - SDI.


- Additional background information:

- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

F15K, SF12K, starcat, dstop, SDI(M), DARB, CP0_GDTransID, CP1_GDTransID            

INTERNAL SUMMARY:

SUBMITTER: Tong-Pheng Koh APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.