SRDB ID   Synopsis   Date
48121   Sun Fire[TM] 12K/15K: PCI SERR panic after missing ce0 device on the X2222A adapter during boot   29 Oct 2002

Status Issued

Description
- Problem Statement:

Missing ce0 device on the X2222A Sun[TM] Dual Fast Ethernet + Dual SCSI PCI Adapter 
during OBP probe can lead to a PCI SERR panic during OS boot

- Symptoms:

 . The panic occurs while booting the OS. After the initial failure occurs, there are 
   successive boot failures.
   
 . Exposure to this problem comes after a setkeyswitch on/HPOST execution of the domain. The problem can 
   appear at install or after a period of stable operation and during normal reboot operations. 
   
 . If OBP is unable to probe the ce0 interface of the X2222A adapter, this is the symptom
   of a condition that results in a PCI SERR panic while booting the OS. The signature is:
   
   WARNING: pcisch-2: PCI fault log start:
   PCI SERR
   PCI error occurred on device #6 dwordmask=0 bytemask=0
   pcisch-2: PCI primary error (0):pcisch-2: PCI secondary error (0):pcisch-2: PBM AFAR 0.00000000:
   WARNING: pcisch2: PCI config space CSR=0x4280<signaled-system-error>
   pcisch-2: PCI fault log end.
   
                                      

SOLUTION SUMMARY:
  
- Troubleshooting:
 
 A. To determine if the panic is the X2222A card failure, execute the following commands at the OBP
prompt: 
  
 . Execute show-disks and probe-scsi-all to identify if there are missing scsi connections or disk targets. 
 
 . Execute show-nets to identify if there are missing ce interfaces.
  
   ok show-nets
   a) /pci@1d,700000/pci@1/network@1
   b) /pci@1c,700000/network@3,1
   c) /pci@1c,700000/pci@1/network@1
   d) /pci@1c,700000/pci@1/network@0

   NOTE: /pci@1d,700000/pci@1/network@0 is the ce0 of this adapter and missing from the probe.
   
 B. Set the OBP variable diag-switch?=true to enable OBP device probing diagnostic output on the
    console:
   
  . The following is an example device probe of PCI B in a good state:
 
   Probing PCI B pci 
   Probing /pci@1d,700000 Device 1  pci 
   Probing /pci@1d,700000/pci@1 Device 0  network 
   Probing /pci@1d,700000/pci@1 Device 1  network 
   Probing /pci@1d,700000/pci@1 Device 2  scsi disk tape scsi disk tape 
   Probing /pci@1d,700000/pci@1 Device 3  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 4  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 5  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 6  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 7  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 8  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 9  Nothing there 
   Probing /pci@1d,700000/pci@1 Device a  Nothing there 
   Probing /pci@1d,700000/pci@1 Device b  Nothing there 
   Probing /pci@1d,700000/pci@1 Device c  Nothing there 
   Probing /pci@1d,700000/pci@1 Device d  Nothing there 
   Probing /pci@1d,700000/pci@1 Device e  Nothing there 
   Probing /pci@1d,700000/pci@1 Device f  Nothing there 
   Probing /pci@1d,700000 Device 2  bootbus-controller iosram 
   Probing /pci@1d,700000 Device 3  pci108e,1100 network firewire usb 

  . The following is an example device probe of PCI B in a failing state:
 
   Probing PCI B pci 
   Probing /pci@1d,700000 Device 1  pci 
   Probing /pci@1d,700000/pci@1 Device 1  network 
   Probing /pci@1d,700000/pci@1 Device 2  scsi disk tape scsi disk tape 
   Probing /pci@1d,700000/pci@1 Device 3  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 4  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 5  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 6  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 7  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 8  Nothing there 
   Probing /pci@1d,700000/pci@1 Device 9  Nothing there 
   Probing /pci@1d,700000/pci@1 Device a  Nothing there 
   Probing /pci@1d,700000/pci@1 Device b  Nothing there 
   Probing /pci@1d,700000/pci@1 Device c  Nothing there 
   Probing /pci@1d,700000/pci@1 Device d  Nothing there 
   Probing /pci@1d,700000/pci@1 Device e  Nothing there 
   Probing /pci@1d,700000/pci@1 Device f  Nothing there 
   Probing /pci@1d,700000 Device 2  bootbus-controller iosram 
   Probing /pci@1d,700000 Device 3  pci108e,1100 network firewire usb  
   
   NOTE: /pci@1d,700000/pci@1 Device 0 is missing from the probe.
 
   
 C. To determine that the panicked PCI device instance corresponds to
    the X2222A card with a missing ce device: 
   
  . Perform a grep of a previously captured /etc/path_to_inst file for
    the pcisch instance = 2 (pcisch-2).  Use explorer output if
    available.
   
    "/pci@1d,700000" 2 "pcisch"

  . Grep "pcisch2" from a previous successful start up in the /var/adm/messages file:
   
    pcisch2 at root: SAFARI 0x1d 0x700000
    pcisch2 is /pci@1d,700000

  . Alternatively, you can use the Solaris Device Path Decoder at
    http://kwyjibo.aus.sun.com.


- Resolution:

 . Replace the identified X2222A card.  This has resolved the problem in all
   previous instances of this bug.
     
 . Verify OBP sees all 5 PCI devices by setting diag-switch?=true: pci (bridge), 2x network, 
   2x scsi ports (where Device 2 represents the two port connections):
 
   Probing PCI B pci
   Probing /pci@1d,700000 Device 1  pci
   Probing /pci@1d,700000/pci@1 Device 0  network
   Probing /pci@1d,700000/pci@1 Device 1  network
   Probing /pci@1d,700000/pci@1 Device 2  scsi disk tape scsi disk tape

 . If the replacement card does not correct the panic, be certain to redo the troubleshooting 
   steps above to confirm that the replacement card is not experiencing the same failure.
 
 . HPOST will be modified to drive the JTAG buss for the PCI adapters during the auto connect
   sequence to set TRST to low and change clock on.
    
- Summary of part number and patch ID's 

X2222A - 501-5727-03   Dual FastEthernet + Dual SCSI PCI Adapter 
SMS 1.2 patch 112488-08    

- References and bug IDs

4723789 - PCI devices within Cauldron adapter intermittantly not seen
4732416 - hpost needs to modify auto-connect to properly connect the Cauldron

- Additional background information:

None. 

- Meta-Data/Problem categorization:

Product/Platform: SF15K/SF12K
Category: hardware

- Keywords

ce missing panic PCI SERR pcisch adapter dual                                     

INTERNAL SUMMARY:

SUBMITTER: Gino Valencia BUG REPORT ID: 4723789, 4732416 APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.