SRDB ID | Synopsis | Date | ||
48203 | Sun Fire[TM] 12K/15K: Dstop: CP0_GDTransID data error detected by SDI(M) | 31 Oct 2002 |
Status | Issued |
Description |
- Problem Statement: Dstop: CP0_GDTransID data error detected by SDI(M). - Symptoms: redx wfail command output reports the following failure signature: redxl> dumpf load dsmd.dstop.020506.1859.02 Created Mon May 6 18:59:04 2002 By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50 executing as pid=7984 On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM Domain = 0=A Platform = rasputin Boards in dump: master SC CPs/CSBs[1:0]: 3 EXB[17:0]: 12100 Slot0[17:0]: 12100 Slot1[17:0]: 12100 -D option, -d "DSMD DomainStop Dump" 0 errors occurred while creating this dump. redxl> wfail SDI EX08/S0 Master_Stop_Status0[31:0] = 0004000F MStop0[3:0]: All SDI logic is DStopped + Recordstopped. SDI EX08/S0 Dstop0[31:0] = 00418040 Dstop0[16]: D DARB texp requests all Dstop (M) Dstop0[22]: D 1E SDI internal CP port requested Dstop SDI EX08/S0 CP_Error0[31:0] = 02008200 Mask = 580067FF CPErr0[25]: D 1E CP0 half GDTransid parity error (M) {cp0_gdidp,cp0_gdid[5:0]} = 01 FAIL EXB EX8 with CP C0: Dstop/Rstop detected by SDI. Primary service FRU is EXB EX8. Secondary service FRU is CSB C0 or the logic centerplane. SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB. SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB. DARB C0: enabled ports (expanders) [17:0]: 16100 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100 DARB C1: enabled ports (expanders) [17:0]: 16100 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100 redxl> shdarb -e 0 8 Note: Data is displayed from the currently loaded dump file. DARB C0 (0) Component ID = 44303049 Port 8 InterAsicStatus[31:0] = 80200009 IAStat[21,31]: Other DARB requests Dstop+Rstop for this exp IAStat[ 3]: EXB requests Domainstop, EXB internal reason Port 8 PortStatus[13:0] = 3000 PStat[12,13]: Port Dstop+Rstop: Another port or asic detected error redxl> shdarb -e 1 8 Note: Data is displayed from the currently loaded dump file. DARB C1 (1) Component ID = 44303049 Port 8 InterAsicStatus[31:0] = 80200009 IAStat[21,31]: Other DARB requests Dstop+Rstop for this exp IAStat[ 3]: EXB requests Domainstop, EXB internal reason Port 8 PortStatus[13:0] = 3000 PStat[12,13]: Port Dstop+Rstop: Another port or asic detected error redxl> wfail -B exp_abus EX8/AB0 # redx wfail of dump 020507.0959.04
SOLUTION SUMMARY:
- Troubleshooting: From the dump header you can see that this Dstop dumpfile was generated by dsmd while the domain was running. This is also evident by the dump file name - dsmd.dstop files are created by dsmd as part of an ASR. Note the following first two errors (1E) on the two different error registers: Dstop0 - SDI internal CP port requested Dstop CPErr0 - CP0 half GDTransid parity error (M) Note FAIL EXB EX8 with CP C0. This would be what POST would choose to deconfigure in order to recover the domain with maximal fault-free domain given the fault implied by this error during the POST run. Note the recommendation to the FRU(s) to be replaced in order to remove the fault: Primary service FRU is EXB EX8. Secondary service FRU is CSB C0 or the logic centerplane. cp0_gdtransid_l[5:0] and cp1_gdtransid_l[5:0] are bidirectional identifiers passed through the DARB from SDI to SDI. It is used to convey SDI STB information or device and tag information for the associated data transfer that follows 2 cycles later. For DARB to SDI outbound, it is always preceded by a demand 2 cycles previous. For SDI to DARB inbound, it is always preceded by a TEXP at least 2 cycles previous. Bit sliced 6 bits per DARB, total length is 12bit wide: 11 10:9 8:6 5 4 3:0 P1 abort_l dstat_l device_l ld_stb_l dtarg_l data_tag_l P0 <-------------------------> <----------------------------> CP1 arbiter slice CP0 arbiter slice cp0_gdtransid_par_l(P0) and cp1_gdtransid_par_l(P1) are the parity on bidirectional gdtransid slices, one for each arbiter slice. Transferred concurrent (same cycle) as gdtransid_l. On the bus, gdtransid_par_l = 1 if gdtransid_l all = 1. {cp0_gdidp,cp0_gdid[5:0]} = 01 = 00000001, so we have data error on the CP0_GDTransID -> CP 0 half GDTransid parity error. Since this signal crosses component boundaries (i.e., Expander 8 and Centerplane (DARB 0)), the possible service FRUs are the EXB 8 and Centerplane. Wfail calls out the primary service FRU as EXB 8 and the secondary service FRU as CSB C0 or the logic centerplane. - Resolution: The first FRU to be swapped out should be EX8. If the problem/ error persists the Centerplane should be suspect. - Summary of part number and patch ID's http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html - References and bug IDs Specification for an ASIC - SDI. - Additional background information: - Meta-Data/Problem categorization: Product/Platform: SF12K/SF15K Category: - Keywords F15K, SF12K, starcat, dstop, SDI(M), DARB, CP0_GDTransID, CP1_GDTransID
INTERNAL SUMMARY:
SUBMITTER: Tong-Pheng Koh APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: