SRDB ID   Synopsis   Date
48202   Sun Fire[TM] 12K/15K: Dstop: SDI Data Status Parity Error   31 Oct 2002

Status Issued

Description
- Problem Statement: 

        Dstop: SDI Data Status Parity Error

- Symptoms:

	redx 'wfail' command output reports the following failure signature:

	   01  redxl> dumpf load dsmd.dstop.020514.1219.19
	   02  Created Tue May 14 12:19:20 2002
	   03  By hpost v. 1.2 Generic 112488-04 Mar 18 2002 14:43:00 executing as pid=6599
	   04  On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM
	   05  Domain = 0=A Platform = rasputin
	   06  Boards in dump: master SC CPs/CSBs[1:0]: 3
	   07            EXB[17:0]: 12100
	   08          Slot0[17:0]: 12100
	   09          Slot1[17:0]: 12100
	   10  -D option, -d
	   11  "DSMD DomainStop Dump"
	   12  0 errors occurred while creating this dump.
	   13  redxl> wfail
	   14  SDI EX08/S0 Master_Stop_Status0[31:0] = 1004000F
	   15  MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
	   16  SDI EX08/S0 Dstop0[31:0] = 02018200
	   17          Dstop0[16]: D DARB texp requests all Dstop (M)
	   18          Dstop0[25]: D 1E AXQ requests all Dstop (M)
	   19  AXQ EX08 ( 8) Error_Flag_05[31:0] = 00018001 Mask = 1024FFFF
	   20          Err5[16]: D 1E SDI Data status parity error
	   21  FAIL EXB EX8: Dstop/Rstop detected by AXQ.
	   22  Primary service FRU is EXB EX8.
	   23  SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB.
	   24  SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB.
	   25  DARB C0: enabled ports (expanders) [17:0]: 16100
	   26  DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100
	   27  DARB C1: enabled ports (expanders) [17:0]: 16100
	   28  DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100
                        

SOLUTION SUMMARY:
- Troubleshooting:

	It is evident from the dump header that this Dstop dumpfile was generated by dsmd (lines 10,11)
	while the domain was running. This is also evident by the dump file name -  
	dsmd.dstop files are created by dsmd as part of an ASR.

	Note the following first two errors (1E) on the two different error registers:

	   Dstop0 - AXQ on Expander 8 requests Dstop to SDI(M). (line 18)
	   Err5   - AXQ reports SDI Data Status Parity Error. (line 20)

	Note FAIL EXB EX8 (line 21). This would be what POST would choose to 
	deconfigure in order to recover the domain with maximal fault-free domain 
	given the fault implied by this error during the POST run.

	Note the recommendation to the FRU to be replaced in order to remove 
	the fault (line 22):

	   Primary service FRU is EXB EX8.

	Looking closer at AXQ8:

	   29  redxl> shaxq -e 8
	   30  Note: Data is displayed from the currently loaded dump file.
	   31  AXQ EX8 (8) Component ID = C4312049 Rev 6.0
	   32  Error_Flag_00[31:0] = 00000000 Mask = 0000FFFF
	   33  Error_Flag_01[31:0] = 00000000 Mask = 4000FFFF
	   34  Error_Flag_02[31:0] = 00000000 Mask = 0000FFFF
	   35  Error_Flag_03[31:0] = 00000000 Mask = 21005EFF
	   36  Error_Flag_04[31:0] = 00000000 Mask = 01FEFFFF
	   37  Error_Flag_05[31:0] = 00018001 Mask = 1024FFFF
	   38  Err5[16]: D 1E SDI Data status parity error
	   39       {Rd_Bogon_unload,DStat_par,DStat[8:0]} = 400
	   40       darb_errsave[15:0] = 0400
	   41  Error_Flag_06[31:0] = 00000000 Mask = 7E00FFFF
	   42  Error_Flag_07[31:0] = 00000000 Mask = 63FF7D24
	   43  Error_Flag_08[31:0] = 00000000 Mask = 0000FFFF
	   44  Error_Flag_09[31:0] = 00000000 Mask = 7E00FFFF
	   45  Error_Flag_10[31:0] = 00000000 Mask = 7C00FFFF
	   46  Error_Flag_11[31:0] = 00000000 Mask = 7FF0FFFF

	we can get more details about the particular status that encountered the error.
	Per line 39, we have the Rd_Bogon_unload bus error (bit 10 is set).
	Rd_Bogon_unload signal is from SDI(M) to AXQ. This undirectional
	flow control signal is for Read Bogon unload if Phase =1 and for
	sysreg_data_unload if phase =0. The SDI has a 16 deep FIFO. The
	AXQ contains a counter for each of these commands. The counter
	will decrement each time an unload is received.

- Resolution:

	Since this signal is on the Expander from SDI(M) to AXQ, the
	service FRU would be EXB 8.

- Summary of part number and patch ID's 

	http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html
	
- References and bug IDs

	SunSolve Article 48122

- Additional background information:


- Meta-Data/Problem categorization: 

Product/Platform: SF12K/SF15K 
Category:

- Keywords

15K, 12K, SF15K, SF12K, starcat, dstop, AXQ, SDI Data status parity error                        

INTERNAL SUMMARY:

SUBMITTER: Tong-Pheng Koh APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.