Document fins/I0862-1


FIN #: I0862-1

SYNOPSIS: A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during operation

DATE: Aug/16/02

KEYWORDS: A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during operation


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during 
          operation. 


Sun Alert:          Yes

TOP FIN/FCO REPORT: Yes

PRODUCT_REFERENCE:  Domain Stop on Sun Fire 12K & 15K

PRODUCT CATEGORY:   Server / SW Admin


PRODUCTS AFFECTED:

Systems Affected
------- --------
Mkt_ID   Platform   Model   Description         Serial Number
------   --------   -----   -----------         -------------
  -      F12K        ALL    Sun Fire 12K              -
  -      F15K        ALL    Sun Fire 15K              -


X-Options Affected:
-------------------
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------
  -         -         -          -              -


PART NUMBERS AFFECTED: 

Part Number   Description        Model
-----------   -----------        -----
     -             -               -


REFERENCES:

BugId:	  4676870 - LPA use must be restricted with current hardware.

ESC:      537175 - T/ Domain Dstopped under heavy I/O. 
          538171 - domain D suffered dstop.

SunAlert: 45888


PROBLEM DESCRIPTION:

A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during heavy I/O
load.  All execution and activity on the domain is interrupted.  The
Dstop could result in data loss.

This issue can occur in the following releases: 

       Sun Fire 12K/15K SMS 1.1 
       Sun Fire 12K/15K SMS 1.2 without patch 112488

Below are error messages that might be seen in the /var/adm/messages 
file: 

   May  8 12:09:07 2002 darth esmd[505]: [0 8642043579506 NOTICE
      Cabinet.cc 1002] C5V at IO8/C5V0 has been inserted
   May  8 12:09:10 2002 darth esmd[505]: [0 8645004181881 NOTICE
      Cabinet.cc 1002] C3V at IO8/C3V1 has been inserted
   May  8 12:09:36 2002 darth hwad[342]: [1156 8670273779757 ERR
      InterruptHandler.cc 2159] Domain Stop interrupt detected, domain A
   May  8 12:09:54 2002 darth ssd[301]: [1310 8688320930320 NOTICE
      StartupManager.cc 3065] software component shutdown successful: 
      name=dxs-A
   May  8 12:09:54 2002 darth ssd[301]: [1310 8688320403416 NOTICE
      StartupManager.cc 3065] software component shutdown successful:
      name=dca-A
   May  8 12:10:46 2002 darth dsmd[495]: [2517 8740616578238 WARNING
      EventHandler.cc 155] Record stop has been detected in domain A.          
       

In the Dstop, at least one AXQ or SDI will report a timeout: 

redxlwfail
        SDI EX00/S0  Master_Stop_Status0[31:0] = 7000000F
        MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
        SDI EX00/S0  Dstop0[31:0] = 04018400
        Dstop0[16]: D    DARB texp requests all Dstop (M)
        Dstop0[26]: D 1E AXQ requests Slot0 Dstop (M)
        AXQ EX00 ( 0) Error_Flag_00[31:0] = 00048004  Mask = 0000FFFF
        Err0[18]: D 1E Timeout on command reissue transaction to Slot0
        FAIL Slot SB0:  Dstop/Rstop detected by AXQ. 

The FRU named in this failure (SB0) cannot be identified from the
available information, so this error is not diagnosable.  The FAIL
action is just a guess to satisfy the POST design requirement that
something must be deconfigured after a Dstop to guarantee that the
process terminates.  The failed component is not more suspect than any
other hardware in the domain.

This issue occurs because the current interconnect hardware is not able
to handle Local Physical Addresses (LPA) as implemented with the SMS
software.  The hardware only supports LPA in very restricted
configurations and may be subject to protocol errors under heavy load.

This issue is fixed by Patch 112488 for SMS 1.2.


IMPLEMENTATION: 

         ---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        | X |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)                   
         ---


CORRECTIVE ACTION:

The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned problem.

1. For Sun Fire 12/15K systems with SMS 1.2, install Patch 112488 or 
   later.

OR

2. For Sun Fire 12/15K systems with SMS 1.1 it is recommended to upgrade 
   to SMS 1.2 and then install the above mentioned patch.

Note: When upgrading from 112488 (or lower) to 112488 (or higher),
      all domains must undergo a setkeyswitch standby/on process for 
      the fix for Bug 4676870 to be put into effect.


COMMENTS:

None

============================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.