SRDB ID   Synopsis   Date
47424   Sun Fire[TM] 12K/15K: HPOST failure, Error in LBIST signature   23 Oct 2002

Status Issued

Description

During the lbist (local built-in selftest) stage of HPOST, a domain has the following problems:

Example #1

kangaroo-sc0:sms-svc:6> setkeyswitch -d a on 
Powering on: CSB at CS0 
Waiting on exclusive access to EXB(s): 3FFFF. 
Powering on: CSB at CS1 
Powering on: EXB at EX0 
Powering on: HPCI at IO0 
Powering on: CPU at SB0 

Significant contents of .postrc (domain) 
        /etc/opt/SUNWSMS/SMS1.2/config/A/.postrc: 
allow_us3_cpus 

Reading domain blacklist file /etc/opt/SUNWSMS/config/A/blacklist ... 
# ident "@(#)blacklist  1.1     01/04/02 SMI" 
Reading platform blacklist file /etc/opt/SUNWSMS/config/platform/blacklist ... 
# ident "@(#)blacklist  1.1     01/04/02 SMI" 
Reading system ASR blacklist file /etc/opt/SUNWSMS/config/asr/blacklist ... 
stage lport_reset: Assert reset to IOC ports in -Q mode... 
stage_lport_reset(): Not -Q mode; Skipping Stage lport_reset 
stage asic_probe: ASIC probe and JTAG/CBus integrity test... 
stage brd_rev_eval: Board Revision Evaluation and Compliance... 
stage cpu_probe: CPU Module probe... 
stage cdc_probe: CDC DIMM probe... 
stage mem_probe: Memory dimm probe... 
Dimm SB0/P0/B1/D0 appears missing. 
Dimm SB0/P1/B1/D0 appears missing. 
Dimm SB0/P2/B1/D0 appears missing. 
Dimm SB0/P3/B1/D0 appears missing. 
stage adapter_probe: I/O adapter probe... 
stage cp_shorts: Centerplane Shorts... 
found zero bit from EXB EX4 xdata_04_0_l. 
stage lbist: Logic BIST... 
ERR: DMX C1/D0 Error in LBIST signature. Expected 0xCCCE41E9 Got 0x63D3013A. 
**** CAUTION: DMX C1/D0 has failed lbist (Logic Built-in Selftest). 
        Since cp_shorts (Centerplane Short test) failed it is probable 
        that this is caused by either a short to ground on a chip 
        interface that is not used or needed in the current hardware 
        configuration, or an error condition in one of the interfacing 
        asics. Cp_shorts has already deconfigured the offender. Current 
        policy is to not deconfigure this component, but to provide this 
        caution that there is some fault present, although possibly in 
        unused hardware. 

DCDS LBIST DISABLED 
DX LBIST DISABLED 
DCDS LBIST DISABLED 
DX LBIST DISABLED 
stage ibist: Interconnect BIST... 
stage field_ict: Field Interconnect Tests... 
stage mbist1: Internal memory BIST... 
stage mbist2: External memory BIST... 
stage cbus_bbsram: Console Bus test of bootbus sram... 
stage sc_interrupt: DARB to SC interrupt... 
stage cdc_clear: CDC DIMM clear... 
stage cpu_lpost: Test all L1 CPU boards... 
Performing ASIC config with bus config a/d/r = 333... 
        Slot0 in domain: 00001 
        Slot1 in domain: 00001 
            EXBs in use: 00000 
stage nmb_cpu_lpost: Non-Mem Board Proc tests... 
Performing ASIC config with bus config a/d/r = 333... 
        Slot0 in domain: 00001 
        Slot1 in domain: 00001 
            EXBs in use: 00000 
stage_cpu_lpost(): No NMB Boards in config. Skipping Stage nmb_cpu_lpost. 
stage wib_lpost: Wildcat interface board tests... 
stage_wib_lpost(): No good Wcis; Skipping Stage wib_lpost 
stage pci_lpost: Test all L1 I/O boards... 
stage exp_lpost: Domain-level board and system tests... 
stage cpu_lpost_II: CPU L1 domain/system tests... 
stage pci_lpost_Q: Init all L1 I/O boards under -Q... 
stage cpu_lpost_II_Q: CPU L1 domain/system init under -Q... 
stage final_config: Final configuration... 
Creating CPU SRAM handoff structures... 
Creating GDCD IOSRAM handoff structures in Slot IO0... 
Writing domain information to PCD... 
Configured in 333 with 4 procs, 8.000 GBytes, 2 IO adapters. 
Interconnect frequency is 149.984 MHz, Measured. 
Golden sram is on Slot IO0. 
POST (level=16, verbose=20) execution time 4:35 
kangaroo-sc0:sms-svc:7>                   

Example #2

<snipped> 
stage lbist: Logic BIST... 
<snipped> 
ERR: SDC SB14 Error in LBIST signature. Expected 0xEEAEDD73 Got 0xC372CCFF. 
(416c107d) 
FAIL Slot SB14: slot failure 
Primary service FRU is Slot SB14.                   

SOLUTION SUMMARY:

Explanation:

Specifically, the DMX C1/D0 (example #1; in example #2, it is the SDC for SB14 which had the failure) chip has failed lbist (Logic Built-in Selftest). Since cp_shorts (Centerplane Short test) failed, it is probable that this is caused by either a short to ground on a chip interface that is not used or needed in the current hardware configuration, or an error condition in one of the interfacing asics. Cp_shorts has already deconfigured the offender.

It is possible to further verify if the SB is indeed bad by skipping the LBIST check in the HPOST run and see if the system board reports a failure further in the HPOST testing. To do this, set up the domain's .postrc file (/etc/opt/SUNWSMS/config/<domain_id>/.postrc) with the following entry:

no_asic_lbist sdc

This will cause HPOST to run skipping the LBIST tests. Hopefully a later failure in HPOST will confirm that the SB itself is bad.

It is also important to note that SMS 1.1 does not have SDC LBIST, so a failure of this sort should not occur in SMS 1.1. With SMS 1.2 the SDC LBIST does exist, so a failure of this type in SMS 1.2 does indicate the SB as the primary FRU.

Action:

You can try to isolate further in HPOST testing adding the entry to the .postrc file to make 100% certain that the SB is bad, or if time won't allow further testing, replace the implicated SB as the most likely failed component.

INTERNAL SUMMARY:

SUBMITTER: Joshua Freeman APPLIES TO: AFO Vertical Team Docs/HAS, Hardware/Sun Fire /15000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.