SRDB ID   Synopsis   Date
48106   Sun Fire[TM] 12K/15K: SSCPOST detects PCI bridge chip initialization failure on System Controller   29 Oct 2002

Status Issued

Description
- Problem Statement:

SSCPOST detects PCI bridge chip initialization failure on System Controller


- Symptoms:

The PCI bridge chip initialization failure is manifested following a power on of the
System Controller.  This may be the after the initial power on at system installation
or after power is removed to the platform.  Once the bridge chip has initialized properly,
it will continue to do so until the next loss of power, at which time a problematic chip
may or may not fail.

Failure messages may be observed on the SC console.  In order for SSCPOST to run properly
at poweron or following a software reset, you must set the OBP environment variables as
follows:

    diag-switch?=true
    diag-level=pmax-epmax
    post-on-sir?=true
    
The failures will begin in the SSCPOST Phase 2 section--"PHASE 2: Basic CB PCI Bus Examination".
An example failure is shown below. 

Look for the following message just before the OBP ok prompt returns:

"Power On Self Test Failed.  Cause: CP1500 POST Passed; SC POST v1.18 Failed
 Test 1: CB PCI Search/Probe, Subtest 3: CB PCI Device Probe"

Alternatively, the SSCPOST results are available from Solaris[TM] with:
	sc% prtconf -vp | grep ssc-post

-----------------------------------------------------
Software Power ON

CPU speed = 440 Mhz
 mc1 value = 0000.0000.544c.b9dd
@(#) SPARCengine(tm)Ultra CP 1500  3.14.6 created 2001/02/07 14:48
.
.
<snipped output>
.
.
PHASE 2: Basic CB PCI Bus Examination
TEST 1: CB PCI Search/Probe
INIT: CB PCI Device Probe
Addr 0x000001fe.01030000 Bus 3 Dev 0 Func 0: TI Bridge
cb_bridge_config: Bridge configured for mem_base 0x10000000 mem_limit 0x15ffffff
Addr 0x000001fe.01030800 Bus 3 Dev 1 Func 0: SBBC (Rev. 2)
Addr 0x000001fe.01031000 Bus 3 Dev 2 Func 0: TI Bridge
cb_bridge_config: Bridge configured for mem_base 0x18000000 mem_limit 0x39ffffff
        Unexpected event occurred - Trap
        tl  tt  tstate             tpc                tnpc
        01  32  00000099.80001607  ffffffff.f0c0a5f8  ffffffff.f0c0a5fc
        AFSR=0x00000000.88000000
        AFAR=0x000001fe.01050000
        (PRIV) Privileged Code
        (TO)   Time Out Error
Failing address is in IO space
opt_arg = 0x00000000.00000000
Current Phase ptr: name Basic CB PCI Bus Examination, id 2
Current test ptr: name CB PCI Search/Probe, id 1
Current subtest ptr: name CB PCI Device Probe
WARNING: Bus 5 Device 0 Unexpected Device ID 0x00000000
.
.
<snipped output>
.
.
WARNING: Data Compare Failed:
        Addr: 0x000001ff.608c0000
        Exp: 0x16
        Got: 0x00
        XOR: 0x16
Exiting to OBP ...
.
.
<snipped output>
.
.
Probing /pci@1,1 Device 1  network 
Probing /pci@1,1 Device 2  scsi disk tape 
Probing /pci@1,1 Device 3  pci108e,1000 network 
NOTICE: probe+: /pci@1f,0/pci@1/pci@1/pci@0/ethernet@1,1 not found.
.
.
<snipped output>
.
.
Optional IO board ttys: ttyc, ttyd Not detected
Power On Self Test Failed.  Cause: CP1500 POST Passed; SC POST v1.18 Failed,
Test 1: CB PCI Search/Probe, Subtest 3: CB PCI Device Probe
ok 

                                    
SOLUTION SUMMARY:
- Troubleshooting:

The simple method to confirm the problem is to run  'show-nets' at OBP.  Normally there
should be 22 networks listed (20 eri networks and 2 hme networks).  When the problem
occurs, you will most often see only two hme network devices.  None of the eri network
devices will be listed.

For example, the output from show-nets below shows only the two hme network devices
and none of the eri devices.

    ok show-nets
    a) /pci@1f,0/pci@1,1/network@3,1
    b) /pci@1f,0/pci@1,1/network@1,1
    q) NO SELECTION 
    Enter Selection, q to quit: q
    ok 

Alternatively, you can check the count of eri and hme interfaces from Solaris[TM]:

	sc% netstat -k | grep -c ^hme
	2
	sc% netstat -k | grep -c ^eri
	0

- Resolution:

A power cycle can often be used as a workaround to clear the failure condition.  You can
perform this by issuing the following commands on the alternate SC when SMS is running:

    % poweroff SCx (where X is either 0 or 1)
    % poweron SCx

If SMS is not running on the alternate SC, you can unseat the System Controller board (501-5121),
leave it out for about 10 seconds, and then reseat the board.

If neither method clears the condition, the board should be replaced as described in the
following section.

- Summary of part number and patch ID's 

A System Controller power cycle was incorporated into SMS 1.1 patch 112100-02 to try ten
power cycles in order to get the alternate SC to pass SSCPOST on any failing condition
(i.e., not just for the bridge chip initialization issue).  The fix is integrated into
SMS 1.2.

If the failure is not cleared, the board should be replaced with the part numbers listed below.  These parts include a TI revision bridge chip which is not susceptible to the poweron initialization problem manifested by the Intel part.

501-5121-11 or higher (SC Board)
501-5473-13 or higher (Nordica)

- References and bug IDs

BugId: 4490854 - PCI bridge secondary clocks sometimes come up disabled from power on

- Additional background information:

None. 
  
- Meta-Data/Problem categorization:

Product/Platform: Sun Fire 12K/15K
Category:

- Keywords
SC, SSCPOST, "Extended POST", starcat, 15K, 12K, SF15K, SF12K 
                                 
INTERNAL SUMMARY:

SUBMITTER: David Lafko BUG REPORT ID: 4490854 APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.