Document fins/I0730-1
FIN #: I0730-1
SYNOPSIS: A global arbstop may occur in certain instances on Enterprise 10000
servers.
DATE: Oct/15/01
KEYWORDS: A global arbstop may occur in certain instances on Enterprise 10000
servers.
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: A global arbstop may occur in certain instances on
Enterprise 10000 servers.
Sun Alert: Yes
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: Enterprise 10000
PRODUCT CATEGORY: Server / SW Admin
PRODUCTS AFFECTED:
Systems Affected
----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- E10000 ALL Enterprise E10000 -
X-Options Affected
------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- - - - -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
- - -
REFERENCES:
BugId: 4451899 - All domains arbstopped after magic_cookie and
libscan error reported to messages.
4454194 - Multiple domains arbstopped when running
autoconfig/hpost with jtag broken procs.
PatchId: 109175 - SSP 3.3: system-board voltages reported in SSP
MIB are inconsistent.
110412 - SSP 3.4: Eveready fan trays spin fast.
ESC: 530143
530840
531088
531325
530289
SunAlert: 40034
URL: http://esp.west/dcpubs/ServicePubs/hw.html
PROBLEM DESCRIPTION:
Under certain circumstances, for example, jtag-scan-broken CPU module
running command like 'hpost' or 'autoconfig' may create a global
Arbstop leaving multiple domains completely dead until POST is run
again. Arbstop stands for "arbitration stop" and normally occurs when
the E10K hardware detects a fatal error.
This problem may occur on E10K servers running the following Sun
Service Processor (SSP) releases:
. SSP 3.1
. SSP 3.1.1
. SSP 3.2
. SSP 3.3
. SSP 3.4.
Following are some of the error messages can be observed if the problem
occurs:
logged to the platform "messages" file (located in the directory
"/var/opt/SUNWssp/adm"):
cbs: cbs: WARNING:[[post:9140]] libscan/sd_test_chain_length/
WARNING:Chain length (-11) is probably incorrect.
cbs: cbs: ERR:[[post:9140]] libscan/sd_test_ring_length/
ERROR:sd_scan_test_length failed
If multiple domains unexpectedly arbstop while the "hpost" command
(which usually is called on behalf of the "bringup" command) was
running on a separate domain, the following kind of post failures will
be logged to the "post*.log" file of this domain (located in the
directory "/var/opt/SUNWssp/adm/$SUNW_HOSTNAME/post"):
phase jtag_integ: JTAG probe and integrity test...
ERR: libcbs:cbs_check_chain_length:libscan error
FAIL PROC 8.3: scan integrity fail.
NOTE: Arbstops seen on multiple or all domains combined with the
occurrence of the above messages would indicate the issue
described in this document. The occurrence of the above
messages alone is not an indication of this issue.
Following are the three scenarios which might trigger multiple arbstops:
1. When running the "autoconfig" command against a
jtag-scan-broken CPU module (see Note 1 below).
2. When running the "bringup" or "hpost" command at POST
diagnostic
level 24 or higher (see the "-l" command option) against a
jtag-scan-broken CPU module (see Note 2 below).
3. When any of the following CPU module part numbers are newly
installed:
. 501-5838
. 501-5866
. 501-6008
and the user did not run the "autoconfig" command and reboot
the SSP BEFORE running the "bringup" or "hpost" command at
POST
diagnostic level 24 or higher.
NOTE 1: Newly and properly installed CPU modules are normally fully
jtag functional. Therefore, scenario 1 is unlikely to be
encountered.
NOTE 2: The "bringup" and "hpost" commands are capable
of detecting
jtag-scan-broken CPUs at POST diagnostic levels below 24
(e.g. level 7 or level 16) without the risk of encountering
an arbstop.
The following example shows the output of the "bringup" command
at
POST diagnostic level 7, where a jtag-scan-broken CPU module has
been detected (this assumes that the "autoconfig" command has
been
run successfully and the SSP has been rebooted):
my-ssp:mydomain-b3% bringup -l7
...
phase jtag_integ: JTAG probe and integrity test...
FAIL b/r/c = sysboard5/proc1/spitfire: Component ID discrepancy.
FAIL Actual F805C03E; Expected one of:
FAIL B003602F or
FAIL A003602F or
FAIL 9003602F or
FAIL 2003602F or
FAIL 1003602F or
FAIL 0003602F or
FAIL 4002502F
The root cause for this problem is jtag-scan-broken CPU installed on
System Board. If the system has jtag-scan-broken CPU, then autoconfig
immediately generates global arbstop, bringing down all othe domains in
platform during the 'Hpost'. After the global arbstop, it is necessary
to manually reset the power breaker on the slot where the
jtag-scan-broken CPU resides in order to be able to execute bring up
the domains in platform.
NOTE: This problem has been corrected in patches for SSP 3.3 and
SSP 3.4. With the installation of patchId# 109175 and
110412, global arbstop should not occur to above three
scenarios which might trigger multiple arbstops
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives to avoid the above mentioned
problem.
Install the appropriate SSP patch:
SSP 3.3 109175 or later
SSP 3.4 110412 or later
Customers running SSP 3.2 and below should consider upgrading to SSP 3.3
or higher together with the appropriate patches.
Use the following workaround recommendations if the patches are not
installable:
1. Do not run the "autoconfig" command against any CPU that is
suspected
to be jtag-scan-broken.
2. Do not run the "bringup" or "hpost" command at POST
diagnostic level 24
or higher against any CPU that is suspected to be jtag-scan-broken
(see the "-l" option of the "bringup" or "hpost"
command).
3. When upgrading by installing hardware modules (processor, system board,
or IO mezzanine type change), always run the "autoconfig" command
and
reboot the SSP immediately afterwards.
COMMENTS:
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.