SRDB ID   Synopsis   Date
48190   Sun Fire[TM] 12K/15K: Rstop: No components would be failed based on this state   31 Oct 2002

Status Issued

Description
- Problem Statement: 

	Rstop:  No components would be failed based on this state

- Symptoms:

	'wfail' output reports something similar to the following:

	   01 redxl> dumpf load dsmd.rstop.020805.1100.46
	   02 Created Mon Aug  5 11:00:46 2002
	   03 By hpost v. 1.2 Generic 112488-04 Mar 18 2002 14:43:00  executing as pid=22831
	   04 On ssc name =  sc0.
	   05 Domain =  4=E = etuac21    Platform = sfgedas1
	   06 Boards in dump: master SC    CPs/CSBs[1:0]: 3
	   07           EXB[17:0]: 00C00
	   08         Slot0[17:0]: 00400
	   09         Slot1[17:0]: 00C00
	   10 -D option, -d
	   11 "DSMD RecordStop Dump"
	   12 0 errors occurred while creating this dump.
	   13 redxl> wfail
	   14 SDI EX10/S0: SDI is RStopped, requested by DARB.
	   15 SDI EX11/S0  Master_Stop_Status0[31:0] = F0040308
	   16         MStop0[3]: SDI is Recordstopped
	   17 SDI EX11/S0  Recordstop0[31:0]  = 04018400
	   18         Rstop0[16]: R    DARB texp request Recordstop (M)
	   19         Rstop0[26]: R 1E Slot0 asserted EccErr, enabled to cause Rstop (M)
	   20 Note: SDI EX11/S0 detects error from Slot SB11, not in dump. Ignored.
	   21 DARB C0: enabled ports (expanders)          [17:0]: 03FFF
	   22 DARB C0: exps request Rstop                 [17:0]: 00800
	   23 DARB C0: other darb req Rstop for exps      [17:0]: 00800
	   24 DARB C1: enabled ports (expanders)          [17:0]: 03FFF
	   25 DARB C1: exps request Rstop                 [17:0]: 00800
	   26 DARB C1: other darb req Rstop for exps      [17:0]: 00800
	   27 No components would be failed based on this state.            

SOLUTION SUMMARY:
- Troubleshooting:

	The dump header tells us that this Rstop was generated by dsmd (lines 10,11) while
	a domain was active. This is also evident by the dump file name.  dsmd.rstop files are
	created by dsmd as part of error capturing. Walking the error chain:

	 - EX11/S0 (SDI0) reports a first error from its Slot 0 board, SB11 (line 19).
	 - However, on line 20, wfail notes that SB11 is not in the dump. The "Ignored"
	   statement means that SB11 is not considered in selecting a component to FAIL.
	 - As no other errors are present, the diagnosis returns no failures (line 27).

	Therefore, the source of the error is attributed to an ECC error from a board that is 
	not in this domain. We can confirm that SB11 was indeed not included in the dump 
	by checking the Slot 0 board mask (line 08). Since POST refers to the PCD to 
	determine which boards are part of a domain, this tells us that SB11 is not part 
	of Domain E in the PCD. 

	At this point in the analysis, it is wise to examine activity on other domains 
	around the time of this Dstop. Looking through the explorer we see: 

	   28 % ls sf15k/[A-R]/adm/dump/dsmd.rstop.020805.*.
	   29 D/adm/dump/dsmd.rstop.020805.2042.40  E/adm/dump/dsmd.rstop.020805.1100.46  
	   30 F/adm/dump/dsmd.rstop.020805.1101.08

	There's also an Rstop on Domain F shortly after the Domain E Rstop. And we find 
	a relationship between Domains E and F: 
                                             
	   31 % grep "^[SI][BO]11" sf15k/showboards_-v.out
	   32 IO11/C3V0   On     C3V               -               -         etuac21
	   33 IO11/C5V0   On     C5V               -               -         etuac21
	   34 IO11/C3V1   On     C3V               -               -         etuac21
	   35 IO11/C5V1   On     C5V               -               -         etuac21
	   36 SB11        On     CPU             Active        Passed        etuac31
	   37 IO11        On     HPCI            Active        Passed        etuac21
	   38 % grep "^[EF]" sf15k/showplatform_-v.out | head -2
	   39 E           etuac21           etuac21                Running Solaris
	   40 F           etuac31           etuac31                Running Solaris

	Domains E and F share EX11, and SB11 is assigned to Domain F. We can determine 
	that EX11 is split from the dump file as well: 

	   41 redxl> shsdi -v 11
	   42 Note: Data is displayed from the currently loaded dump file.
	   43 SDI EX11/S0    Component ID = 64317049
	   44         Master_Reset_Config[31:0] = 0B000000
	   45            0   SDI_diserrlog          MResC[0]       => SDI Intern Reset
	   46            0   Slot0_diserrlog        MResC[1]       
	   47            0   Slot1_diserrlog        MResC[2]       
	   48         0x0B   ExpID[4:0]             MResC[28:24]   
	   49            0   Mode[2:0]              MResC[31:29]   Master (0)
	   50         Master_Stop_Config[31:0]  = 41001997
	   51            1   DstopEnbl              MStopC[0]      
	   52            1   RstopEnbl              MStopC[1]      
	   53            1   SCIntEnbl              MStopC[2]      
	   54            0   L1Err->ErrPause        MStopC[3]      
	   55            1   Dstop->ErrPause        MStopC[4]      
	   56            0   L1Ecc->ScInt[1:0]      MStopC[6:5]    
	   57            3   L1Ecc->Rstop[1:0]      MStopC[8:7]    
	   58            0   L1Err->ScInt[1:0]      MStopC[10:9]   L1Slot asserted err
	   59            3   L1Err->Dstop[1:0]      MStopC[12:11]  L1Slot asserted err
	   60            0   SBBCErr->SCInt         MStopC[13]     
	   61            0   SBBCErr->Dstop         MStopC[14]     
	   62            1   EnblStopReqChk         MStopC[24]     
	   63            0   L1Dstop->ExpDStop      MStopC[28]     
	   64            0   AnyDstop->ExpDStop     MStopC[29]     
	   65            1   Dstop->DReset          MStopC[30]     For split exp
	   66            0   ShiftErrPausePhase     MStopC[31]     
	   67         Core_Config[21:0]   = 0DB3E2
	   68            0   Pass4TargIDDisbl       CoreC[0]       Rev 4+
	   69            1   Slot1=SerDom1          CoreC[1]       Rev 4+
	   70            1   SplitSlotEnbl          CoreC[5]       In master SDI (0)

	However, SB11's assignment is only available from explorer (or, of course, a live SC). 

	Because EX11 is a split expander, and the source of the error is within that 
	expander's boardset, Domain E suffers a residual Rstop. It can be ignored.

- Resolution:

	The real source of the problem is a stop condition on a different domain
	that shares this expander (Domain F in the example above). Analyze that
	stop dump.

- Summary of part number and patch ID's 

	
- References and bug IDs

	SunSolve Article 48122	

- Additional background information:

	http://cpre-amer.west.sun.com/esg/hsg/starcat/xctt/hw_expander_split.html

- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

15K, 12K, SF15K, SF12K, starcat, rstop, split, expander, no, components, failed            

INTERNAL SUMMARY:

SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.