Document fins/I0862-1
FIN #: I0862-1
SYNOPSIS: A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during operation
DATE: Aug/16/02
KEYWORDS: A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during operation
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during
operation.
Sun Alert: Yes
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: Domain Stop on Sun Fire 12K & 15K
PRODUCT CATEGORY: Server / SW Admin
PRODUCTS AFFECTED:
Systems Affected
------- --------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- F12K ALL Sun Fire 12K -
- F15K ALL Sun Fire 15K -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- - - - -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
- - -
REFERENCES:
BugId: 4676870 - LPA use must be restricted with current hardware.
ESC: 537175 - T/ Domain Dstopped under heavy I/O.
538171 - domain D suffered dstop.
SunAlert: 45888
PROBLEM DESCRIPTION:
A Sun Fire 12K or 15K domain may Domain Stop (Dstop) during heavy I/O
load. All execution and activity on the domain is interrupted. The
Dstop could result in data loss.
This issue can occur in the following releases:
Sun Fire 12K/15K SMS 1.1
Sun Fire 12K/15K SMS 1.2 without patch 112488
Below are error messages that might be seen in the /var/adm/messages
file:
May 8 12:09:07 2002 darth esmd[505]: [0 8642043579506 NOTICE
Cabinet.cc 1002] C5V at IO8/C5V0 has been inserted
May 8 12:09:10 2002 darth esmd[505]: [0 8645004181881 NOTICE
Cabinet.cc 1002] C3V at IO8/C3V1 has been inserted
May 8 12:09:36 2002 darth hwad[342]: [1156 8670273779757 ERR
InterruptHandler.cc 2159] Domain Stop interrupt detected, domain A
May 8 12:09:54 2002 darth ssd[301]: [1310 8688320930320 NOTICE
StartupManager.cc 3065] software component shutdown successful:
name=dxs-A
May 8 12:09:54 2002 darth ssd[301]: [1310 8688320403416 NOTICE
StartupManager.cc 3065] software component shutdown successful:
name=dca-A
May 8 12:10:46 2002 darth dsmd[495]: [2517 8740616578238 WARNING
EventHandler.cc 155] Record stop has been detected in domain A.
In the Dstop, at least one AXQ or SDI will report a timeout:
redxlwfail
SDI EX00/S0 Master_Stop_Status0[31:0] = 7000000F
MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
SDI EX00/S0 Dstop0[31:0] = 04018400
Dstop0[16]: D DARB texp requests all Dstop (M)
Dstop0[26]: D 1E AXQ requests Slot0 Dstop (M)
AXQ EX00 ( 0) Error_Flag_00[31:0] = 00048004 Mask = 0000FFFF
Err0[18]: D 1E Timeout on command reissue transaction to Slot0
FAIL Slot SB0: Dstop/Rstop detected by AXQ.
The FRU named in this failure (SB0) cannot be identified from the
available information, so this error is not diagnosable. The FAIL
action is just a guess to satisfy the POST design requirement that
something must be deconfigured after a Dstop to guarantee that the
process terminates. The failed component is not more suspect than any
other hardware in the domain.
This issue occurs because the current interconnect hardware is not able
to handle Local Physical Addresses (LPA) as implemented with the SMS
software. The hardware only supports LPA in very restricted
configurations and may be subject to protocol errors under heavy load.
This issue is fixed by Patch 112488 for SMS 1.2.
IMPLEMENTATION:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned problem.
1. For Sun Fire 12/15K systems with SMS 1.2, install Patch 112488 or
later.
OR
2. For Sun Fire 12/15K systems with SMS 1.1 it is recommended to upgrade
to SMS 1.2 and then install the above mentioned patch.
Note: When upgrading from 112488 (or lower) to 112488 (or higher),
all domains must undergo a setkeyswitch standby/on process for
the fix for Bug 4676870 to be put into effect.
COMMENTS:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.