Document fins/I0730-1


FIN #: I0730-1

SYNOPSIS: A global arbstop may occur in certain instances on Enterprise 10000
          servers.

DATE: Oct/15/01

KEYWORDS: A global arbstop may occur in certain instances on Enterprise 10000
          servers.


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: A global arbstop may occur in certain instances on
          Enterprise 10000 servers.
              

Sun Alert:          Yes

TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  Enterprise 10000 
 
PRODUCT CATEGORY:   Server / SW Admin


PRODUCTS AFFECTED:

Systems Affected  
----------------
Mkt_ID   Platform   Model   Description               Serial Number	
------   --------   -----   -----------               ------------- 
  -       E10000     ALL    Enterprise E10000               -


X-Options Affected
------------------
Mkt_ID   Platform   Model   Description               Serial Number	
------   --------   -----   -----------               ------------- 
  -         -         -          -                          -


PART NUMBERS AFFECTED: 

Part Number   Description   Model
-----------   -----------   -----
     -             -          -


REFERENCES:

BugId:    4451899 - All domains arbstopped after magic_cookie and 
                    libscan error reported to messages. 
          4454194 - Multiple domains arbstopped when running 
                    autoconfig/hpost with jtag broken procs.

PatchId:  109175 - SSP 3.3: system-board voltages reported in SSP 
                      MIB are inconsistent.
          110412 - SSP 3.4: Eveready fan trays spin fast.

ESC:      530143 
          530840 
          531088 
          531325 
          530289

SunAlert: 40034

URL:      http://esp.west/dcpubs/ServicePubs/hw.html

      
PROBLEM DESCRIPTION: 

Under certain circumstances, for example, jtag-scan-broken CPU module
running command like 'hpost' or 'autoconfig' may create a global
Arbstop leaving multiple domains completely dead until POST is run
again.  Arbstop stands for "arbitration stop" and normally occurs when
the E10K hardware detects a fatal error.   

This problem may occur on E10K servers running the following Sun
Service Processor (SSP) releases:
  
   . SSP 3.1 
   . SSP 3.1.1 
   . SSP 3.2 
   . SSP 3.3 
   . SSP 3.4.

Following are some of the error messages can be observed if the problem
occurs:
 
logged to the platform "messages" file (located in the directory
"/var/opt/SUNWssp/adm"): 

    cbs: cbs: WARNING:[[post:9140]] libscan/sd_test_chain_length/
              WARNING:Chain length (-11) is probably incorrect.
    cbs: cbs: ERR:[[post:9140]] libscan/sd_test_ring_length/
              ERROR:sd_scan_test_length failed            

If multiple domains unexpectedly arbstop while the "hpost" command
(which usually is called on behalf of the "bringup" command) was
running on a separate domain, the following kind of post failures will
be logged to the "post*.log" file of this domain (located in the
directory "/var/opt/SUNWssp/adm/$SUNW_HOSTNAME/post"): 

    phase jtag_integ: JTAG probe and integrity test...
    ERR: libcbs:cbs_check_chain_length:libscan error
    FAIL PROC 8.3: scan integrity fail.            

NOTE: Arbstops seen on multiple or all domains combined with the
      occurrence of the above messages would indicate the issue 
      described in this document.  The occurrence of the above 
      messages alone is not an indication of this issue.
   
Following are the three scenarios which might trigger multiple arbstops:

   1.  When running the "autoconfig" command against a 
       jtag-scan-broken CPU module (see Note 1 below). 
       
   2.  When running the "bringup" or "hpost" command at POST
diagnostic 
       level 24 or higher (see the "-l" command option) against a
       jtag-scan-broken CPU module (see Note 2 below).
        
   3.  When any of the following CPU module part numbers are newly 
       installed: 

         .  501-5838 
         .  501-5866 
         .  501-6008 

	and the user did not run the "autoconfig" command and reboot
	the SSP BEFORE running the "bringup" or "hpost" command at
POST
	diagnostic level 24 or higher. 
        
        NOTE 1: Newly and properly installed CPU modules are normally fully 
                jtag functional.  Therefore, scenario 1 is unlikely to be
                encountered. 

        NOTE 2: The "bringup" and "hpost" commands are capable
of detecting
                jtag-scan-broken CPUs at POST diagnostic levels below 24 
                (e.g. level 7 or level 16) without the risk of encountering 
                an arbstop.
 
        The following example shows the output of the "bringup" command
at 
        POST diagnostic level 7, where a jtag-scan-broken CPU module has 
        been detected (this assumes that the "autoconfig" command has
been 
        run successfully and the SSP has been rebooted):
        
           my-ssp:mydomain-b3%  bringup -l7
           ...
           phase jtag_integ: JTAG probe and integrity test...
           FAIL b/r/c = sysboard5/proc1/spitfire: Component ID discrepancy.
           FAIL    Actual F805C03E; Expected one of:
           FAIL           B003602F   or
           FAIL           A003602F   or
           FAIL           9003602F   or
           FAIL           2003602F   or
           FAIL           1003602F   or
           FAIL           0003602F   or
           FAIL           4002502F

The root cause for this problem is jtag-scan-broken CPU installed on
System Board.  If the system has jtag-scan-broken CPU, then autoconfig
immediately generates global arbstop, bringing down all othe domains in
platform during the 'Hpost'.  After the global arbstop, it is necessary
to manually reset the power breaker on the slot where the
jtag-scan-broken CPU resides in order to be able to execute bring up
the domains in platform.     
           
NOTE: This problem has been corrected in patches for SSP 3.3 and
      SSP 3.4.  With the installation of patchId# 109175 and 
      110412, global arbstop should not occur to above three 
      scenarios which might trigger multiple arbstops


IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        | X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION: 

The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives to avoid the above mentioned
problem.

Install the appropriate SSP patch:

	SSP 3.3    109175 or later 
        SSP 3.4    110412 or later 

Customers running SSP 3.2 and below should consider upgrading to SSP 3.3 
or higher together with the appropriate patches.

Use the following workaround recommendations if the patches are not 
installable:

  1. Do not run the "autoconfig" command against any CPU that is
suspected 
     to be jtag-scan-broken. 

  2. Do not run the "bringup" or "hpost" command at POST
diagnostic level 24 
     or higher against any CPU that is suspected to be jtag-scan-broken 
     (see the "-l" option of the "bringup" or "hpost"
command). 

  3. When upgrading by installing hardware modules (processor, system board, 
     or IO mezzanine type change), always run the "autoconfig" command
and 
     reboot the SSP immediately afterwards. 


COMMENTS: 

============================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        



Copyright (c) 1997-2003 Sun Microsystems, Inc.