SRDB ID | Synopsis | Date | ||
48187 | Sun Fire[TM] 12K/15K: Dstop: Steering bus A input parity error | 31 Oct 2002 |
Status | Issued |
Description |
- Problem Statement: Dstop: Steering bus A input parity error - Symptoms: 'wfail' output reports something similar to the following: 01 redxl> dumpf load dsmd.dstop.020410.1454.52 02 Created Wed Apr 10 14:54:53 2002 03 By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50 executing as pid=26063 04 On ssc name = f15k-02-sc0-hme0. 05 Domain = 0=A = omis320 Platform = f15k-02 06 Boards in dump: master SC CPs/CSBs[1:0]: 3 07 EXB[17:0]: 3FFFF 08 Slot0[17:0]: 3FFFF 09 Slot1[17:0]: 3FFFF 10 -D option, -d 11 "DSMD DomainStop Dump" 12 0 errors occurred while creating this dump. 13 redxl> wfail 14 SDI EX00/S0: All SDI is DStopped and RStopped, requested by DARB. 15 SDI EX01/S0: All SDI is DStopped and RStopped, requested by DARB. 16 SDI EX02/S0: All SDI is DStopped and RStopped, requested by DARB. 17 SDI EX03/S0: All SDI is DStopped and RStopped, requested by DARB. 18 SDI EX04/S0: All SDI is DStopped and RStopped, requested by DARB. 19 SDI EX05/S0: All SDI is DStopped and RStopped, requested by DARB. 20 SDI EX06/S0: All SDI is DStopped and RStopped, requested by DARB. 21 SDI EX07/S0: All SDI is DStopped and RStopped, requested by DARB. 22 SDI EX08/S0: All SDI is DStopped and RStopped, requested by DARB. 23 SDI EX09/S0: All SDI is DStopped and RStopped, requested by DARB. 24 SDI EX10/S0: All SDI is DStopped and RStopped, requested by DARB. 25 SDI EX11/S0: All SDI is DStopped and RStopped, requested by DARB. 26 SDI EX12/S0: All SDI is DStopped and RStopped, requested by DARB. 27 SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB. 28 SDI EX14/S0: All SDI is DStopped and RStopped, requested by DARB. 29 SDI EX15/S0 Dstop1[31:0] = 00088008 30 Dstop1[19]: D 1E SDI Slave 2 requested all Dstop 31 SDI EX15/S0 Master_Stop_Status0[31:0] = 3004000F 32 MStop0[3:0]: All SDI logic is DStopped + Recordstopped. 33 SDI EX15/S0 Dstop0[31:0] = 00010001 34 Dstop0[16]: D DARB texp requests all Dstop (M) 35 SDI EX15/S2 Master_Stop_Status0[31:0] = 00000008 36 MStop0[3]: SDI is Recordstopped 37 SDI EX15/S2 Dstop0[31:0] = 00088008 38 Dstop0[19]: D 1E SDI internal core requested Dstop 39 SDI EX15/S2 Core_Error0[31:0] = 00108010 Mask = 7FE8FFFF 40 CoreErr0[20]: D 1E Steering bus A input parity error (S) 41 {steera_parin,steera_in[32:0]} = 0.00000020 42 FAIL EXB EX15: Dstop/Rstop detected by SDI EX15/S2. 43 Primary service FRU is EXB EX15. 44 SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB. 45 SDI EX17/S0: All SDI is DStopped and RStopped, requested by DARB. 46 DARB C0: enabled ports (expanders) [17:0]: 3FFFF 47 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 08000 48 DARB C1: enabled ports (expanders) [17:0]: 3FFFF 49 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 08000
SOLUTION SUMMARY:
- Troubleshooting: The dump header tells us that this Dstop was generated by dsmd (lines 10,11) while a domain was active. This is also evident by the dumpf file name. dsmd.dstop files are created by dsmd as part of an ASR. Walking the error chain: - EX15 is the first error in the domain. The slave SDI2 requests the Dstop (line 30). - The specific errors in SDI2 are reported next. We have a Steering bus A input parity error (lines 39-41). - All other expanders error free. We can quickly determine this because these expanders only have a single line of output in wfail. - wfail then informs us to FAIL EX15 (lines 42-43) as the primary FRU. The steering busses direct data flow through the SDI. Steering is generated in the Master SDI and driven to the slave SDIs. The steering tells the SDIs where to look for the next transfer of data. For example, if the centerplane wants to transfer to Slot 0, steering tells the Slot 0 port of the SDIs to take data from the Centerplane. Referring back to the wfail output, EX15/S0 is our Master SDI and EX15/S2 is the slave SDI reporting the error. Thus, the parity error occurred between EX15/S0 and EX15/S2. Since the steering bus is completely contained within the expander, EX15 is the faulty FRU. - Resolution: Replace the Expander reporting the steering parity error. In this example, replace EX15. - Summary of part number and patch ID's 501-5179 Expander - References and bug IDs SunSolve Article 48122 - Additional background information: Looking deeper, the history of the SDIs can be examined further to illustrate the parity error. Let's start with the steering history on EX15/S0: 50 redxl> shsdi 15 0 steera 51 Note: Data is displayed from the currently loaded dump file. 52 SDI EX15/S0 Output history of Steer A 53 <----- STEERA ----> 54 STEERA STOP 55 [32:0] P DEMA P entry 56 1FFFFFFDF 1 1 1 0 old 57 1FFFFFF9F 0 1 1 1 58 1FFFFFFDF 1 1 1 2 59 1FFFFFF9F 0 1 1 3 60 1FFFFFFDF 1 1 1 4 61 1FFFFFF9F 0 1 1 5 62 1FFFFFFDF 1 1 1 6 63 1FFFFFF9F 0 1 1 7 64 1FFFFFFDF 1 1 1 8 65 1FFFFFF9F 0 1 1 9 66 1FFFFFFDF 1 1 1 10 67 1FFFFFF9F 0 1 1 11 68 1FFFFFFDF 1 1 1 12 69 1FFFFFF9F 0 1 1 13 70 1FFFFFFDF 1 1 1 14 71 1FFFFFF9F 0 1 1 15 72 1FFFFFFDF 1 1 1 16 73 1FFFFFF9F 0 1 1 17 74 1FFFFFFDF 1 1 1 18 75 1FFFFFF9F 0 1 1 19 76 1FFFFFFDF 1 1 1 20 77 1FFFFFF9F 0 1 1 21 78 1FFFFFFDF 1 1 1 22 79 1FFFFFF9F 0 1 1 23 80 1FFFFFFDF 1 1 1 24 81 1FFFFFF9F 0 1 1 25 82 1FFFFFFDF 1 0 0 26< 83 1FFFFFF9F 0 1 1 27 84 1FFFFFFDF 1 1 1 28 85 1FFFFFF9F 0 1 1 29 86 1FFFFFFDF 1 1 1 30 87 1FFFFFF9F 0 1 1 31 new The cycle of interest is cycle 26 (line 82) and tagged by a <, where we have a steering value of 1FFFFFFDF a parity of 1. The steering busses are protected by even parity, so already we've got a disconnect. 1FFFFFFDF has 32 1's. Parity should be a zero. Now for the steering history on EX15/S2: 88 redxl> shsdi 15 2 steera 89 Note: Data is displayed from the currently loaded dump file. 90 SDI EX15/S2 Output history of Steer A 91 <----- STEERA ----> 92 STEERA STOP 93 [32:0] P DEMA P entry 94 1FFFFFFFF 1 1 1 0 old 95 1FFFFFFFF 1 1 1 1 96 1FFFFFFFF 1 1 1 2 97 1FFFFFFFF 1 1 1 3 98 1FFFFFFFF 1 1 1 4 99 1FFFFFFFF 1 1 1 5 100 1FFFFFFFF 1 1 1 6 101 1FFFFFFFF 1 1 1 7 102 1FFFFFFFF 1 1 1 8 103 1FFFFFFFF 1 1 1 9 104 1FFFFFFFF 1 1 1 10 105 1FFFFFFFF 1 1 1 11 106 1FFFFFFFF 1 1 1 12 107 1FFFFFFFF 1 1 1 13 108 1FFFFFFFF 1 1 1 14 109 1FFFFFFFF 1 1 1 15 110 1FFFFFFFF 1 1 1 16 111 1FFFFFFFF 1 1 1 17 112 1FFFFFFFF 1 1 1 18 113 1FFFFFFFF 1 1 1 19 114 1FFFFFFFF 1 1 1 20 115 1FFFFFFFF 1 1 1 21 116 1FFFFFFFF 1 1 1 22 117 1FFFFFFFF 1 1 1 23 118 1FFFFFFFF 1 1 1 24 119 1FFFFFFFF 1 1 1 25 120 1FFFFFFFF 1 1 1 26< 121 1FFFFFFFF 1 1 1 27 122 1FFFFFFFF 1 1 1 28 123 1FFFFFFFF 1 1 1 29 124 1FFFFFFFF 1 1 1 30 125 1FFFFFFFF 1 1 1 31 new On cycle 26 (line 120), all values are high. The steering value is 1FFFFFFFF with a parity of 1. This parity is correct. Comparing 1FFFFFFDF (EX15/S0) to this, bit 5 is flipped. This bit flip is seen in SDI2 (line 137) by an XOR of the steering histories on that cycle. 126 redxl> shsdi -e 15 2 127 Note: Data is displayed from the currently loaded dump file. 128 SDI EX15/S2 Component ID = 64317049 129 Master_Stop_Status0[31:0] = 00000008 130 MStop0[3]: SDI is Recordstopped 131 Master_Stop_Status1[31:0] = 7F7F0000 132 Dstop0[31:0] = 00088008 133 Dstop0[19]: D 1E SDI internal core requested Dstop 134 Recordstop0[31:0] = 00000000 135 Core_Error0[31:0] = 00108010 Mask = 7FE8FFFF 136 CoreErr0[20]: D 1E Steering bus A input parity error (S) 137 {steera_parin,steera_in[32:0]} = 0.00000020 138 Core_ErrData[4:2][31:0] = 00000000 00080600 00000020 139 Core_ErrData[1:0][31:0] = 00000002 02DD3000 140 Core_Error1[31:0] = 00000000 Mask = FFFFFFFF 141 CP_Error0[31:0] = 00000000 Mask = 7F3F67FF 142 Slot0_Error0[31:0] = 00000000 Mask = 703FFFFF 143 Slot0_Error1[31:0] = 00000000 Mask = FFFF4FFF 144 Slot0_Error2[31:0] = 00000000 Mask = FFFFFFFF 145 Slot1_Error0[31:0] = 00000000 Mask = 703FFFFF 146 Slot1_Error1[31:0] = 00000000 Mask = FFFF4FFF 147 Slot1_Error2[31:0] = 00000000 Mask = FFFFFFFF We also saw this in the initial wfail (line 41). As an aside, a steering state of all 1's is the idle state for the bus. If we look at the steering history for the remaining SDIs on EX15 (left to an exercise for the reader), we'd see that all of the other slave SDIs are in the idle state. - Meta-Data/Problem categorization: Product/Platform: SF12K/SF15K Category: - Keywords 15K, 12K, SF15K, SF12K, starcat, dstop, Steering bus A input parity error
INTERNAL SUMMARY:
SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: