SRDB ID | Synopsis | Date | ||
48202 | Sun Fire[TM] 12K/15K: Dstop: SDI Data Status Parity Error | 31 Oct 2002 |
Status | Issued |
Description |
- Problem Statement: Dstop: SDI Data Status Parity Error - Symptoms: redx 'wfail' command output reports the following failure signature: 01 redxl> dumpf load dsmd.dstop.020514.1219.19 02 Created Tue May 14 12:19:20 2002 03 By hpost v. 1.2 Generic 112488-04 Mar 18 2002 14:43:00 executing as pid=6599 04 On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM 05 Domain = 0=A Platform = rasputin 06 Boards in dump: master SC CPs/CSBs[1:0]: 3 07 EXB[17:0]: 12100 08 Slot0[17:0]: 12100 09 Slot1[17:0]: 12100 10 -D option, -d 11 "DSMD DomainStop Dump" 12 0 errors occurred while creating this dump. 13 redxl> wfail 14 SDI EX08/S0 Master_Stop_Status0[31:0] = 1004000F 15 MStop0[3:0]: All SDI logic is DStopped + Recordstopped. 16 SDI EX08/S0 Dstop0[31:0] = 02018200 17 Dstop0[16]: D DARB texp requests all Dstop (M) 18 Dstop0[25]: D 1E AXQ requests all Dstop (M) 19 AXQ EX08 ( 8) Error_Flag_05[31:0] = 00018001 Mask = 1024FFFF 20 Err5[16]: D 1E SDI Data status parity error 21 FAIL EXB EX8: Dstop/Rstop detected by AXQ. 22 Primary service FRU is EXB EX8. 23 SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB. 24 SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB. 25 DARB C0: enabled ports (expanders) [17:0]: 16100 26 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100 27 DARB C1: enabled ports (expanders) [17:0]: 16100 28 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100
SOLUTION SUMMARY:
- Troubleshooting: It is evident from the dump header that this Dstop dumpfile was generated by dsmd (lines 10,11) while the domain was running. This is also evident by the dump file name - dsmd.dstop files are created by dsmd as part of an ASR. Note the following first two errors (1E) on the two different error registers: Dstop0 - AXQ on Expander 8 requests Dstop to SDI(M). (line 18) Err5 - AXQ reports SDI Data Status Parity Error. (line 20) Note FAIL EXB EX8 (line 21). This would be what POST would choose to deconfigure in order to recover the domain with maximal fault-free domain given the fault implied by this error during the POST run. Note the recommendation to the FRU to be replaced in order to remove the fault (line 22): Primary service FRU is EXB EX8. Looking closer at AXQ8: 29 redxl> shaxq -e 8 30 Note: Data is displayed from the currently loaded dump file. 31 AXQ EX8 (8) Component ID = C4312049 Rev 6.0 32 Error_Flag_00[31:0] = 00000000 Mask = 0000FFFF 33 Error_Flag_01[31:0] = 00000000 Mask = 4000FFFF 34 Error_Flag_02[31:0] = 00000000 Mask = 0000FFFF 35 Error_Flag_03[31:0] = 00000000 Mask = 21005EFF 36 Error_Flag_04[31:0] = 00000000 Mask = 01FEFFFF 37 Error_Flag_05[31:0] = 00018001 Mask = 1024FFFF 38 Err5[16]: D 1E SDI Data status parity error 39 {Rd_Bogon_unload,DStat_par,DStat[8:0]} = 400 40 darb_errsave[15:0] = 0400 41 Error_Flag_06[31:0] = 00000000 Mask = 7E00FFFF 42 Error_Flag_07[31:0] = 00000000 Mask = 63FF7D24 43 Error_Flag_08[31:0] = 00000000 Mask = 0000FFFF 44 Error_Flag_09[31:0] = 00000000 Mask = 7E00FFFF 45 Error_Flag_10[31:0] = 00000000 Mask = 7C00FFFF 46 Error_Flag_11[31:0] = 00000000 Mask = 7FF0FFFF we can get more details about the particular status that encountered the error. Per line 39, we have the Rd_Bogon_unload bus error (bit 10 is set). Rd_Bogon_unload signal is from SDI(M) to AXQ. This undirectional flow control signal is for Read Bogon unload if Phase =1 and for sysreg_data_unload if phase =0. The SDI has a 16 deep FIFO. The AXQ contains a counter for each of these commands. The counter will decrement each time an unload is received. - Resolution: Since this signal is on the Expander from SDI(M) to AXQ, the service FRU would be EXB 8. - Summary of part number and patch ID's http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html - References and bug IDs SunSolve Article 48122 - Additional background information: - Meta-Data/Problem categorization: Product/Platform: SF12K/SF15K Category: - Keywords 15K, 12K, SF15K, SF12K, starcat, dstop, AXQ, SDI Data status parity error
INTERNAL SUMMARY:
SUBMITTER: Tong-Pheng Koh APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: