SRDB ID |
|
Synopsis |
|
Date |
48490 |
|
Sun Fire[TM] 12K/15K: Rstop: CDC1 correctable error |
|
1 Nov 2002 |
- Problem Statement:
Rstop: CDC1 correctable error
- Symptoms:
'wfail' output reports something similar to the following:
01 redxl> dumpf load dsmd.rstop.020626.2116.11
02 Created Wed Jun 26 21:16:13 2002
03 By hpost v. 1.2 Generic 112488-05 May 8 2002 17:05:18 executing as pid=2224
04 On ssc name = orlocn01-sc0.
05 Domain = 0=A = orlsxdp01 Platform = orlocn01
06 Boards in dump: master SC CPs/CSBs[1:0]: 3
07 EXB[17:0]: 0007F
08 Slot0[17:0]: 0007F
09 Slot1[17:0]: 00003
10 -D option, -d
11 "DSMD RecordStop Dump"
12 0 errors occurred while creating this dump.
13 redxl> wfail
14 SDI EX00/S0: SDI is RStopped, requested by DARB.
15 SDI EX01/S0: SDI is RStopped, requested by DARB.
16 SDI EX02/S0: SDI is RStopped, requested by DARB.
17 SDI EX03/S0: SDI is RStopped, requested by DARB.
18 SDI EX04/S0 Master_Stop_Status0[31:0] = C0040008
19 MStop0[3]: SDI is Recordstopped
20 SDI EX04/S0 Recordstop0[31:0] = 00818080
21 Rstop0[16]: R DARB texp request Recordstop (M)
22 Rstop0[23]: R 1E AXQ requests all Recordstop (M)
23 AXQ EX04 ( 4) Error_Flag_07[31:0] = 00088008 Mask = 63FF7D24
24 Err7[19]: R 1E CDC1 correctable error
25 FAIL CDC Dimm EX4: Dstop/Rstop detected by AXQ.
26 Primary service FRU is EXB EX4.
27 SDI EX05/S0: SDI is RStopped, requested by DARB.
28 SDI EX06/S0: SDI is RStopped, requested by DARB.
29 DARB C0: enabled ports (expanders) [17:0]: 3FC7F
30 DARB C0: exps request Rstop [17:0]: 00010
31 DARB C0: other darb req Rstop for exps [17:0]: 00010
32 DARB C1: enabled ports (expanders) [17:0]: 3FC7F
33 DARB C1: exps request Rstop [17:0]: 00010
34 DARB C1: other darb req Rstop for exps [17:0]: 00010
SOLUTION SUMMARY:
- Troubleshooting:
The dump header tells us that this error was encountered while a domain
was active (lines 10,11). This is also evident by the dump file name -
This is also evident by the dumpf file name - dsmd.dstop files are created
by dsmd as part of an ASR. Walking the error chain:
- SDI4 reports a first error of AXQ4 calling for Rstop (line 22).
- AXQ4 reports a correctable error in CDC1 (line 24).
- The CDC DIMM is FAILed from the configuration (line 25).
- EX4 is called out as the FRU (line 26).
The CDC DIMM is divided into 3 SRAMs, read in parallel, forming a 3-way
set associative cache. CDC entries contain information about lines of
memory recently referenced by SSM logic.
Any error (correctable or uncorrectable) in the CDC is recorded and
logged, but never causes a Dstop. Entries with correctable errors are
written back with the corrected data. Uncorrectable errors are treated
as cache misses.
- Resolution:
Since the error is correctable, frequency must be considered. If the
error is an isolated incident, no action is required. Single bit errors
will happen, and that's what ECC protection is designed for.
If, however, a series of correctable errors occur on the same CDC,
the expander should be replaced (the CDC DIMM is not a FRU). For
quantifying "too many failures" standard rules for soft errors apply.
- Summary of part number and patch ID's
http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html
- References and bug IDs
SunSolve Article 48122
Document 816-5053-10
- Additional background information:
None
- Meta-Data/Problem categorization:
Product/Platform: SF12K/SF15k
Category:
- Keywords
15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K,
starcat, rstop, CDC1 correctable error
INTERNAL SUMMARY:
SUBMITTER: Scott Davenport
APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000
ATTACHMENTS:
Copyright (c) 1997-2003 Sun Microsystems, Inc.