SRDB ID | Synopsis | Date | ||
48491 | Sun Fire[TM] 12K/15K: Dstop: CP0 demand bus parity error | 1 Nov 2002 |
Status | Issued |
Description |
- Problem Statement: Dstop: CP[01] demand bus parity error - Symptoms: 'wfail' output reports something similar to the following: 01 redxl> dumpf load dsmd.dstop.020506.2128.46 02 Created Mon May 6 21:28:47 2002 03 By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50 executing as pid=6862 04 On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM 05 Domain = 0=A Platform = rasputin 06 Boards in dump: master SC CPs/CSBs[1:0]: 3 07 EXB[17:0]: 12100 08 Slot0[17:0]: 12100 09 Slot1[17:0]: 12100 10 -D option, -d 11 "DSMD DomainStop Dump" 12 0 errors occurred while creating this dump. 13 redxl> wfail 14 SDI EX08/S0 Master_Stop_Status0[31:0] = E004000F 15 MStop0[3:0]: All SDI logic is DStopped + Recordstopped. 16 SDI EX08/S0 Dstop0[31:0] = 00418040 17 Dstop0[16]: D DARB texp requests all Dstop (M) 18 Dstop0[22]: D 1E SDI internal CP port requested Dstop 19 SDI EX08/S0 CP_Error0[31:0] = 2004A004 Mask = 580067FF 20 CPErr0[18]: D 1E CP0 demand bus parity error (M) 21 cp0_{dembusp,texp,unload,demand[1:0]} = 01 22 CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M) 23 cp0_{dembusp,texp,unload,demand[1:0]} = 01 24 cp1_{dembusp,texp,unload,demand[1:0]} = 00 25 FAIL EXB EX8: Dstop/Rstop detected by SDI EX8/S0. 26 Primary service FRU is EXB EX8. 27 FAIL EXB EX8 with CP C0: Dstop/Rstop detected by SDI. 28 Primary service FRU is EXB EX8. 29 Secondary service FRU is CSB C0 or the logic centerplane. 30 SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB. 31 SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB. 32 DARB C0: enabled ports (expanders) [17:0]: 16100 33 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100 34 DARB C1: enabled ports (expanders) [17:0]: 16100 35 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100
SOLUTION SUMMARY:
- Troubleshooting: The dump header tells us that this Dstop was generated by dsmd (lines 10,11) while a domain was active. This is also evident by the dumpf file name - dsmd.dstop files are created by dsmd as part of an ASR. Walking the error chain: - Master SDI on EX8 calls for Dstop as directed by itself (line 18) - Master SDI on EX8 reports errors in the CPErr0 register (lines 20,22) - EX8 is FAILed from the configuration and named as a primary FRU (lines 25,26) - EX8's low centerplane half is FAILed from the configuration (line 27) - EX8 and CS0/CP are named as primary and secondary FRUs (lines 28,29) Each DARB sources a parity protected demand signal to an expander's Master SDI. The demand tells the SDI to expect data to arrive four cycles later (4 and 5 cycles later if the centerplane is degraded). In the 'wfail' output, the demand signals are shown (lines 23, 24). The low two bits comprise the demand. 00 = target is slot 0 [cp1 above] 01 = target is slot 1 [cp0 above] 10 = not used 11 = idle state (no demand event in progress) In this example, DARB0 indicated slot 1 as the target (line 23) while DARB1 indicated slot 0 as the target (line 24). The demand signal from DARB0 had a parity error (line 20) thus accounting for a bit flip in bit 0. This is also why 'wfail' chooses to fail centerplane half 0 from the configuration. Also, since the DARBs disagree, the SDI sees this as a loss of lockstep in the centerplane. Therefore, the CP arbiter lockstep error (line 22) is recorded. This error is a result of the parity error. - Resolution: Repair/replace EX8. If errors persist, investigate issues with CS0 as it drives the low half of the centerplane. If CS0 has no fault history, repair/replace the centerplane. - Summary of part number and patch ID's http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html - References and bug IDs SunSolve Article 48122 SunSolve Article 48223 DARB ASIC Specification - Additional background information: By using the capture information in the SDI, the specific bit in error in the demand signal can be determined. Another example of a demand bus parity error: 36 SDI EX08/S0 CP_Error0[31:0] = 2004A004 Mask = 580067FF 37 CPErr0[18]: D 1E CP0 demand bus parity error (M) 38 cp0_{dembusp,texp,unload,demand[1:0]} = 04 39 CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M) 40 cp0_{dembusp,texp,unload,demand[1:0]} = 04 41 cp1_{dembusp,texp,unload,demand[1:0]} = 00 Here, bit 2 differs, indicating a parity error on the unload signal from DARB0. The unload signal is a unidirectional signal sent from the DARB to the Master SDI. During operation, the SDI keeps track of the DARB input buffer fullness. The unload signal asserted by the DARB is an indicator to the SDI that the DARB has unloaded a prior request, thus freeing up a buffer slot. This does not change the diagnosis listed earlier. - Meta-Data/Problem categorization: Product/Platform: SF12K/SF15K Category: - Keywords 15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K, starcat, dstop, demand bus parity error
INTERNAL SUMMARY:
SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: