SRDB ID |
|
Synopsis |
|
Date |
48106 |
|
Sun Fire[TM] 12K/15K: SSCPOST detects PCI bridge chip initialization failure on System Controller |
|
29 Oct 2002 |
- Problem Statement:
SSCPOST detects PCI bridge chip initialization failure on System Controller
- Symptoms:
The PCI bridge chip initialization failure is manifested following a power on of the
System Controller. This may be the after the initial power on at system installation
or after power is removed to the platform. Once the bridge chip has initialized properly,
it will continue to do so until the next loss of power, at which time a problematic chip
may or may not fail.
Failure messages may be observed on the SC console. In order for SSCPOST to run properly
at poweron or following a software reset, you must set the OBP environment variables as
follows:
diag-switch?=true
diag-level=pmax-epmax
post-on-sir?=true
The failures will begin in the SSCPOST Phase 2 section--"PHASE 2: Basic CB PCI Bus Examination".
An example failure is shown below.
Look for the following message just before the OBP ok prompt returns:
"Power On Self Test Failed. Cause: CP1500 POST Passed; SC POST v1.18 Failed
Test 1: CB PCI Search/Probe, Subtest 3: CB PCI Device Probe"
Alternatively, the SSCPOST results are available from Solaris[TM] with:
sc% prtconf -vp | grep ssc-post
-----------------------------------------------------
Software Power ON
CPU speed = 440 Mhz
mc1 value = 0000.0000.544c.b9dd
@(#) SPARCengine(tm)Ultra CP 1500 3.14.6 created 2001/02/07 14:48
.
.
<snipped output>
.
.
PHASE 2: Basic CB PCI Bus Examination
TEST 1: CB PCI Search/Probe
INIT: CB PCI Device Probe
Addr 0x000001fe.01030000 Bus 3 Dev 0 Func 0: TI Bridge
cb_bridge_config: Bridge configured for mem_base 0x10000000 mem_limit 0x15ffffff
Addr 0x000001fe.01030800 Bus 3 Dev 1 Func 0: SBBC (Rev. 2)
Addr 0x000001fe.01031000 Bus 3 Dev 2 Func 0: TI Bridge
cb_bridge_config: Bridge configured for mem_base 0x18000000 mem_limit 0x39ffffff
Unexpected event occurred - Trap
tl tt tstate tpc tnpc
01 32 00000099.80001607 ffffffff.f0c0a5f8 ffffffff.f0c0a5fc
AFSR=0x00000000.88000000
AFAR=0x000001fe.01050000
(PRIV) Privileged Code
(TO) Time Out Error
Failing address is in IO space
opt_arg = 0x00000000.00000000
Current Phase ptr: name Basic CB PCI Bus Examination, id 2
Current test ptr: name CB PCI Search/Probe, id 1
Current subtest ptr: name CB PCI Device Probe
WARNING: Bus 5 Device 0 Unexpected Device ID 0x00000000
.
.
<snipped output>
.
.
WARNING: Data Compare Failed:
Addr: 0x000001ff.608c0000
Exp: 0x16
Got: 0x00
XOR: 0x16
Exiting to OBP ...
.
.
<snipped output>
.
.
Probing /pci@1,1 Device 1 network
Probing /pci@1,1 Device 2 scsi disk tape
Probing /pci@1,1 Device 3 pci108e,1000 network
NOTICE: probe+: /pci@1f,0/pci@1/pci@1/pci@0/ethernet@1,1 not found.
.
.
<snipped output>
.
.
Optional IO board ttys: ttyc, ttyd Not detected
Power On Self Test Failed. Cause: CP1500 POST Passed; SC POST v1.18 Failed,
Test 1: CB PCI Search/Probe, Subtest 3: CB PCI Device Probe
ok
SOLUTION SUMMARY:
- Troubleshooting:
The simple method to confirm the problem is to run 'show-nets' at OBP. Normally there
should be 22 networks listed (20 eri networks and 2 hme networks). When the problem
occurs, you will most often see only two hme network devices. None of the eri network
devices will be listed.
For example, the output from show-nets below shows only the two hme network devices
and none of the eri devices.
ok show-nets
a) /pci@1f,0/pci@1,1/network@3,1
b) /pci@1f,0/pci@1,1/network@1,1
q) NO SELECTION
Enter Selection, q to quit: q
ok
Alternatively, you can check the count of eri and hme interfaces from Solaris[TM]:
sc% netstat -k | grep -c ^hme
2
sc% netstat -k | grep -c ^eri
0
- Resolution:
A power cycle can often be used as a workaround to clear the failure condition. You can
perform this by issuing the following commands on the alternate SC when SMS is running:
% poweroff SCx (where X is either 0 or 1)
% poweron SCx
If SMS is not running on the alternate SC, you can unseat the System Controller board (501-5121),
leave it out for about 10 seconds, and then reseat the board.
If neither method clears the condition, the board should be replaced as described in the
following section.
- Summary of part number and patch ID's
A System Controller power cycle was incorporated into SMS 1.1 patch 112100-02 to try ten
power cycles in order to get the alternate SC to pass SSCPOST on any failing condition
(i.e., not just for the bridge chip initialization issue). The fix is integrated into
SMS 1.2.
If the failure is not cleared, the board should be replaced with the part numbers listed below. These parts include a TI revision bridge chip which is not susceptible to the poweron initialization problem manifested by the Intel part.
501-5121-11 or higher (SC Board)
501-5473-13 or higher (Nordica)
- References and bug IDs
BugId: 4490854 - PCI bridge secondary clocks sometimes come up disabled from power on
- Additional background information:
None.
- Meta-Data/Problem categorization:
Product/Platform: Sun Fire 12K/15K
Category:
- Keywords
SC, SSCPOST, "Extended POST", starcat, 15K, 12K, SF15K, SF12K
INTERNAL SUMMARY:
SUBMITTER: David Lafko
BUG REPORT ID: 4490854
APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000
ATTACHMENTS:
Copyright (c) 1997-2003 Sun Microsystems, Inc.