SRDB ID   Synopsis   Date
48483   Sun Fire[TM] 12K/15K: POST: IBIST Failures   21 Nov 2002

Status Issued

Description
- Problem Statement:

	POST: IBIST Failures

- Symptoms:
    
	POST reports an IBIST failure. Some examples:            
         #1:
           stage ibist: Interconnect BIST...
	   AXQ-RMX IBIST...
           ERR: IBIST error: AXQ EX3 RMX C0 Exp 0x0aaaaaaaa Obs 0x03c345555 XOR 0x0969effff.
           FAIL EXB EX3: IBIST failure
           Primary service FRU is EXB EX3.
           Secondary service FRU is CSB C0 or the logic centerplane.
	
         #2:
	   stage ibist: Interconnect BIST...
	   ERR: IBIST error: DMX C1/D0 SDI EX4/S3 Error bits = 0x1555554. FAIL EXB EX4: IBIST failure            

SOLUTION SUMMARY:
- Troubleshooting:

	IBIST is the Interconnect built-in-self-test between two ASICs.
        One of the ASICs acts as the master driving preset/programmable
        bit patterns, and the other ASIC receives the patterns and then 
        echoes them back. If the echoed pattern received by the master
        does not match the original pattern, the test fails.

        In the first example above, AXQ EX3 is the master and RMX0 is the 
	slave. The AXQ EX3 is expecting the pattern 0x0aaaaaaaa, but 
	0x03c345555 is received. 0x0aaaaaaaa XOR 0x03c345555 = 0x0969effff 
	shows the bits in error. 

	However, note that example #1 is bug 4704614, corrected in SMS 1.2 
	patch 112488-10 (or higher).
     
- Resolution:
	
	If the IBIST failure is an AXQ<-->RMX0 error, first confirm that 
	POST patch 112488-10 (or higher) is applied to the system. Otherwise,
	all IBIST failures within close proximity must be considered when
        deciding the appropriate FRU. If there's only a single failure,
        as shown above, it is logical to replace what POST suggests as 
        the primary FRU:  EX3 in this example.

        However, if multiple IBIST failures are present, they must be
        considered holistically. For example, suppose SDI2 on 4 expanders
        all report IBIST failures to a given DMX. Taken together, this would 
        call the DMX (i.e., the centerplane) into question as it is unlikely 
        that multiple expanders would fail.

        Finally, improper board seating is a possible cause for IBIST
        failures. If a service action involving a suspect FRU was recently
        conducted, check seating.     

- Summary of part number and patch ID's

      	112488-10       
        
- References and bug IDs

	4704614
       
- Additional background information:        
        
	For details on what IBIST tests are available, refer to the
        online documentation in 'redx'. 

           redx> ? ibist

        Under no circumstances should IBIST be executed on a component
        supporting a running domain. It will crash all domains relying
        on that component. Furthermore, if IBIST is run manually, the
        component must be power cycled after completion to return the
        ASIC(s) to a known, clean state. Refer to bug 4743556 for an 
	example of why.
      
- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K,
post, ibist

                       

INTERNAL SUMMARY:

SUBMITTER: Scott Davenport BUG REPORT ID: 4704614, 4704614, 4743556 PATCH ID: 112488-10, 112488-10, 112488-10 APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.