Document fcos/A0155-1


FCO #: A0155-1

SYNOPSIS: Installation of certain SBus cards in slot 1 of E10000 having older
          I/O Mezzanine Boards has been found to cause unpredictable behavior,
          including undetected data corruption

DATE: Jan/12/00

KEYWORDS: Installation of certain SBus cards in slot 1 of E10000 having older
          I/O Mezzanine Boards has been found to cause unpredictable behavior,
          including undetected data corruption


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                               FIELD CHANGE ORDER
                  (For Authorized Distribution by SunService)

 
SYNOPSIS: Installation of certain SBus cards in slot 1 of E10000 having
          older I/O Mezzanine Boards has been found to cause unpredictable
          behavior, including undetected data corruption.

TOP FIN/FCO REPORT: Yes

PRODUCT_REFERENCE: E10000 I/O Mezzanine Board

PRODUCT CATEGORY: Server / System Board

PRODUCTS AFFECTED:

Systems Affected:

Mkt_ID   Platform   Model   Description                   Serial Number
------   --------   -----   -----------                   -------------
   -     E10000      ALL    Sun Enterprise 10000 Server   (see comments)
   -     HPC10000    ALL    Sun HPC Server                (see comments)
   -     SSP         ALL    System Service Processor            -
   
X-options Affected:

Mkt_ID   Platform   Model   Description                   Serial Number
------   --------   -----   -----------                   -------------
X2730A       -        -     Sun Enterprise 10000 SBus I/O Board   -

PART NUMBERS AFFECTED:

Part Number   Description                          Model
-----------   -----------                          -----
501-4349-XX   Sun Enterprise 10000 SBus I/O Board    -
501-6525-xx   Mislabeled E10000 SBus I/O Board       -

(SCSI Devices)
Type    Vendor    Model     SerialNumber(Min)    SerialNumber(Max)    Firmware
----    ------    -------   ------------------   ------------------   --------

REFERENCES:
   ECO: WO_12425
   DPCO: 157
   Patch Number: 104853, 105684, 108345
   BugId: 4046986, 4049704, 4243882, 4091053, 4157729, 4258577
   FIN: I0405-3
   DOC: 805-2917-14  Sun Enterprise 10000 System Service Guide

PROBLEM DESCRIPTION:     

Installation of a SunSwift or SunFastEthernet, Gigabit Ethernet, or Sun FC-AL
SBus cards in slot 1 has been found to cause unpredictable behavior in an SBus
card which may be in slot 0 (for example, SOC or (U)DWIS).  Depending on the
type of SBus card in slot 0, this behavior can exhibit itself as resets,
offlines, and other reported errors as well as data corruption errors that
can go undetected by the system.

SunSwift, SunFastEthernet, Gigabit Ethernet 1.1 or Sun FC-AL SBus cards,
including the SunSwift hme/fas combo card (X1018A), the Sun FastEthernet 2.0
(X1059A) card, the SBus Gigabit Ethernet card (X1045A) and the SOC+ FCAL SBus
card (X6730A) should not be installed in SBus slot 1 on an Enterprise 10000
system board which has a 501-4349-xx SBus I/O Mezzanine Board installed.
   
SunSwift, SunFastEthernet, Gigabit Ethernet 1.1, or Sun FC-AL SBus cards
should be installed ONLY in SBus slot 0 of E10000 system boards having the
501-4349-xx SBus I/O Mezzanine Board installed.  See FIN I0405-3 for details.

The workaround for this has been to restrict the placement of the above
mentioned SBus cards to only slot 0 of the SBus.  This becomes a problem if
the customer needs to install one of the restricted cards but critical devices
are already connected to all of the available slots labeled slot 0.

Example Error message;

WARNING: /sbus@75,0/QLGC,isp@0,10000 (isp0):
        ISP: Firmware cmd timeout
WARNING: /sbus@75,0/QLGC,isp@0,10000 (isp0):
        Fatal error, resetting interface
isp0:   State dump from isp registers and driver:
        mailboxes(0-5): 0x4001, 0x4953, 0x5020, 0x2020, 0x1, 0x1
        bus: isr= 0x6, icr= 0x0, conf0= 0x1, conf1= 0x0
        cdma: count= 0, addr= 0x0, status= 0x2, conf= 0x0, fifo_status= 0x40
        dma: count= 0, addr= 0x0, status= 0x2, conf= 0x0
        risc: R0-R7= 0x1e, 0xa5e3, 0x1d, 0x3f32, 0x0, 0x457, 0x30 0x18
        risc: R8-R15= 0x5b78, 0x4bd, 0x470, 0x1000, 0x472, 0x1000, 0x10 0x0
        risc: PSR= 0xf000, IVR= 0x10ef, PCR=0x1000, RAR0=0x30, RAR1=0x5d7e
        risc: LCR= 0x1, PC= 0x457, MTR=0xffff, EMB=0x0, SP=0x5cfe
        request(in/out)= 29/1, response(in/out)= 27/27
        request_ptr(current, base)=  0x71320780 (0x71320040)
        response_ptr(current, base)= 0x71324700 (0x71324040)
        period/offset: 25/8 25/8 25/8 25/8 25/8 25/8 25/8 12/8
        period/offset: 12/8 12/8 12/8 12/8 12/8 12/8 12/8 12/8


Patch #104853-05 for SSP 3.0, patch #105684-07 for SSP 3.1, and SSP3.1.1
provide a check in OBP for unsafe SBus card placement for the SunSwift,
SunFastEthernet and Gigabit Ethernet 1.1 cards.  The Sun FC-AL SBus card
is not checked.  This check will not allow the system to boot if an unsafe 
configuration is detected.  SSP 3.2 does not incorporate this check.

Error messages of this type will be seen if software detects an invalid
configuration;

ERROR: sbus slot 1 on board 1 SYSIO 0 contains: SUNW,hme SUNW,fas 
       This configuration may cause data corruption.
       
(and)

Cannot boot: Configuration error.
    sbus slot 1 on board 1 SYSIO 0 contains: SUNW,hme SUNW,fas 

Manufacturing phased-in a new I/O Mezzanine Board (Sun p/n 501-4478-01) via
ECO# WO_WO_12425 on January 26th, 1998. 

          
IMPLEMENTATION:

         ---
  	| X |  	MANDATORY (Fully Pro-Active)
	 ---    

         ---
  	|   | 	CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
	 ---   
			
         ---
  	|   | 	UPON FAILURE
	 ---
	

REPLACEMENT TIME ESTIMATE: 125 - 170 mins (per 4 to 7 board E10K system)
			   [average is 2.5 hours for a 5 board system]

SPECIAL CONSIDERATIONS:

Implementation of this FCO will be carried out in a phased and prioritized
manner.  Check with your Geo FCO representative for procedure.  It is very
important that you check with your Geo/Country representative for material
availability before scheduling FCO activity as well as reporting completion
of all implementation activities to your Geo/Country representative.

CORRECTIVE ACTION:

NOTE! Only Sun Authorized Service personnel are authorized to perform
      the following maintenance actions on E10000 systems.  SSP 3.0
      has been EOLed.  SSPs should be upgraded to SSP 3.1 or higher.

1(a). Determine locations of the I/O Mezzanine boards to be replaced.

To determine which SBus I/O Mezz boards are installed in a running system,
use the board_id command on the main SSP.  This may be done via remote access
if available at customer's sites.  This can be safely used on running domains
with a 10 second sleep between board_id commands.

Cut-n-save (between the Cut Here markings) the below script into a file
and run the executable as user "ssp".  All system boards should be
powered
on, and a 10 second sleep between board_id commands is recommended.

  --- Cut Here ---  --- Cut Here ---  --- Cut Here ---  --- Cut Here --- 
#!/bin/sh 
# 
# io_bd_rev.sh
# Check for 501-4349 or 501-6525 I/O boards installed in E10000.
#
# Run as user "ssp".
# All system boards should be powered on.
# A 10 second sleep between board_id commands is recommended.
#
# initialize
REV_CK1=0
REV_CK2=0
rm -f /tmp/rev_ck.out 1>/dev/null 2>/dev/null

# get the revisions
echo "Checking board \c"
for b in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
        echo "$b...\c"
        echo `uname -n`":SB:"$b": "`board_id -b io -n $b`
>> /tmp/rev_ck.out
        sleep 10
done
echo "done."
echo

# find the down-rev boards
REV_CK1=`grep -c "501-4349" /tmp/rev_ck.out`
REV_CK2=`grep -c "501-6525" /tmp/rev_ck.out`

# print out the list of down-rev boards
if [ `expr $REV_CK1 + $REV_CK2` -gt 0 ]; then
        echo Down-rev I/O Boards:
        grep "501-4349" /tmp/rev_ck.out | awk '{print $1, $3, $4, $5}'
        grep "501-6525" /tmp/rev_ck.out | awk '{print $1, $3, $4, $5}'
else
        echo No down-rev I/O Boards found.
fi
echo
echo Total number of down-rev I/O Boards: `expr $REV_CK1 + $REV_CK2`

# clean-up
rm /tmp/rev_ck.out

# end of io_bd_rev.sh
  --- Cut Here ---  --- Cut Here ---  --- Cut Here ---  --- Cut Here ---
   
The output of the above script will list the system boards with 
down-rev Part Number (P/N 501-4349-XX or 501-6525-XX) I/O Mezz boards, 
and will list the total of these Boards in the system, for example:

ssp2:domain2% ./io_bd_rev.sh
Checking board
0...1...2...3...4...5...6...7...8...9...10...11...12...13...14...15...done.

Down-rev I/O Boards:
ssp2:SB:0: Part Number 501-4349-04
ssp2:SB:1: Part Number 501-4349-04
ssp2:SB:8: Part Number 501-4349-04
ssp2:SB:15: Part Number 501-4349-50
ssp2:SB:9: Part Number 501-6525-04
ssp2:SB:10: Part Number 501-6525-04

Total number of down-rev I/O Boards: 6

1(b). The eepr command from within redx can be used with recent
   recordstop or arbstop dumps (ones that reflect the current system
   configuration). This method will require at least one dump file
   from each domain in order to read the Part Number information from
   all of the I/O Boards. 

2. Order the required number of replacement I/O Boards and schedule
   time with the customer for this maintenance.
   
3. Determine if the I/O Boards to be replaced are on system boards
   that can be DR detached. For more details on requirements for DR,
   see the following: 

   The E10000 User Guides at:

      http://marvin.west/pubs/starfire_user/

        Sections:	
	  Dynamic Reconfiguration
          Alternate Pathing
          Solaris Installation and Release Notes (includes DR install)

   The RAS Companion:

      http://marvin.West.Sun.COM/pubs/ras_companion

4(a). If using DR to remove system boards. 

    4.1. On the SSP, add the following line to the .postrc file for 
	 the domain (Note that the .postrc file may be located in the 
	 /export/home/ssp directory and will affect all domains or it 
	 may be located in the /var/opt/SUNWssp/etc/<platform
name>/<domain 
	 name> directory. For more information enter: hpost -?postrc or: man 
         postrc).

	 level 64 

    4.2. Set up the system board for DR detach by switching any active
	 AP networks or disk paths, dissociating any mirrors, removing
	 and offlining disks, and other necessary tasks to make the system
	 board available for detach.
 
    4.3. Start the DR process on the SSP, as user "ssp", either with 
	 Hostview or by entering: dr on the command line.

    4.4. Drain the system board.

	 dr> drain <system board #>
	
    4.5. Complete the detach of the system board.

	 dr> complete_detach <system board #>

    4.6. Power off the system board

	 ssp% power -off -sb <system board #>

    4.7. Remove the system board.

    4.8. Replace P/N 501-4349-XX I/O Boards with P/N 501-4478-XX according
   	 to the procedures in the Sun Enterprise 10000 System Service Guide,
   	 P/N  805-2917-14.
   	 
   	 Replacement of an I/O Mezzanine Board will require that the
   	 system board be removed from a running domain and powered off, if
   	 the system board cannot be DR detached, the domain will need to
   	 be shutdown.  Approximate time needed to replace an I/O Board is
   	 20 minutes, after the system board has been powered off.

    4.9. Re-insert the system board, re-connect the I/O cables,
	 and power it on.

    4.10. Init the attach of the system board.

	  dr> init_attach <system board #>

    4.11.  After completion of init_attach, verify that all components are 
          configured into the domain, check the hpost logs for any failures
          and repair as necessary.

    4.12. Complete the attach of the system board.

	  dr> complete_attach <system board #>

    4.13. Repeat steps 2 - 12 for each system board that has an I/O
	  board to be replaced.

    4.14. When all the I/O boards have been replaced, remove or comment
	  out the level 64 entry in the .postrc file.

    4.15. Upgrade SSP software to eliminate boot restrictions as
	  follows:

          SSP Version         OBP Patch 
          -----------         ---------
          3.0                 Upgrade to SSP 3.1 or higher
          3.1                 105684
          3.1.1               108345
          3.2                 N/A
	
4(b). If DR is not being used.

    4.1. As user "root" on the domain, shutdown the domain.

    4.2. As user "ssp" on the SSP, power off all the boards in the
domain.

	 ssp% power -off -sb <domain board list>

    4.3. Replace P/N 501-4349-XX I/O Boards with P/N 501-4478-0X according
         to the procedures in the Sun Enterprise 10000 System Service Guide,
         P/N  805-2917-14.

    4.4. As user "ssp" on the SSP, power on all the system boards in
the
	 domain.

	 ssp% power -on -sb <domain board list>

    4.5. Run a -l64 bringup on the domain, and bringup to OBP.

	 ssp% bringup -l64 -A off

    4.6. After completion of the bringup, verify that all components are 
	 configured into the domain, check the hpost logs for any failures
	 and repair as necessary.

    4.7. Boot the system (Note OBP patch level and boot restrictions).

    4.8. Repeat steps 1 - 7 for each domain that has I/O Boards to be
	 replaced.

    4.9. When all the I/O boards have been replaced, remove or comment
         out the level 64 entry in the .postrc file.

    4.10. After all the domains have had the I/O Boards replaced,
	  upgrade SSP software to eliminate boot restrictions as follows:
 
          SSP Version         OBP Patch 
          -----------         ---------
          3.0                 Upgrade to SSP 3.1 or higher
          3.1                 105684
          3.1.1               108345
          3.2                 N/A

5. Return and scrap the 501-4349-XX and 501-6525-XX I/O Mezz boards.

COMMENTS:

A small number of 501-4349-xx I/O Boards were incorrectly programmed with
the Part Number 501-6525-xx.  If this I/O Board part number is detected it
should be processed as part number 501-4349-xx.

BILLING TYPE:

 Warranty: Sun will provide parts and on-site labor at no charge
           during normal working hours.

 Contract: Sun will provide parts and on-site labor at no charge
           during normal working hours.

 Non Contract: Sun will provide parts and on-site labor at no charge
               during normal working hours.

--------------------------------------------------------------------------
Implementation Footnote:
________________________

i)   In case of Mandatory FCOs, Enterprise Services will attempt to contact
      all known customers to recommend the part upgrade.

ii)  For controlled proactive swap FCOs, Enterprise Services mission critical
     support teams will initiate proactive swap efforts for their respective 
     accounts, as required.

iii) For Replace upon Failure FCOs, Enterprise Services partners will implement

     the necessary corrective actions as and when they are required.

--------------------------------------------------------------------------
 
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access: 
______________
 
* Access the top level URL of http://sdpsweb.EBay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
_______________________
 
* Access the SunSolve Online URL at http://sunsolve.Central/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
____________________
 
Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
accessed internally at the following URL: http://edist.Central/.
  
* From there, follow the hyperlink path of "SunService Documentation"
and 
  click on "FIN & FCO attachments", then choose the appropriate
folder,  
  FIN or FCO.  This will display supporting directories/files for FINs or
  FCOs.
   
Internet Access:
_______________
 
* Access the top level URL of https://infoserver.Sun.COM


--------------------------------------------------------------------------
General:
________

Send questions or comments to finfco-manager@Sun.COM

---------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.