Document fins/I0687-1
FIN #: I0687-1
SYNOPSIS: Failing Fan Problem on E10K
DATE: Jun/28/01
KEYWORDS: Failing Fan Problem on E10K
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Failing fan on Enterprise 10000 server may cause other fans
to turn off.
Sun Alert: Yes
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: E10000 Server Fan
PRODUCT CATEGORY: Server / SW Admin
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- E10000 ALL Ultra Enterprise E10000 -
X-Options Affected
------------------
SSP9S-311-S99N - - SSP 3.1.1 -
SSP9S-320-SAM9 - - E10000 SSP SW 3.2 CD RELEASE -
SSP9S-330-SAM9 - - E10000 SSP SW 3.3 CD RELEASE -
SSP9S-340-SAM9 - - E10000 SSP SW 3.4 CD RELEASE -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
- - -
REFERENCES:
BugId: 4405737- A fan tray mechanical trouble can cause any other
fan tray to power off.
PatchId: 109175: SSP 3.3: system-board voltages reported in SSP
MIB are inconsistent.
110412: SSP 3.4: Eveready fan trays spin fast.
ESC: 529303
528965
Sun Alert: SA-26586
MANUAL: 805-2917-15: Sun Enterprise 10000 System Service Manual.
805-0310-12: Sun Enterprise 10000 System Overview Manual.
PROBLEM DESCRIPTION:
Under certain conditions, fan failure can trigger many other fans to
turn off as a result of software bug and cause system overheating.
This FIN highlights importance of not only replacing the defective
fan but also of applying SSP patch.
If a fan tray FRU on an Enterprise 10000 Server (E10K) fails, other fan
trays can potentially be turned off by the SSP software. Fan trays
3,7,11, and 15 are likely to be turned off and remain off until the
defective fan tray is removed from the system. As a result, System
Boards 6-9 are extremely vulnerable to overheating as fans near these
boards may stay off. E10000 systems with SSP versions 3.1.1 through
3.4 are affected.
Failure symptoms include fans not spinning and/or possibly the
following message appearing repeatedly every 3-4 minutes in the
/var/opt/SUNWssp/adm/messages file:
cbe: NOTICE: fan_VccReset: resetting Vcc for all fans
NOTE: the repeated occurrence of this message is an indication of a failed
fan, and will be present if a fan fails even in patched systems.
Failing fans can also be detected by using 'hostview' or the 'hostinfo
-F' or 'fan' commands.
Example: #fan
Fan Status
-------------------------------
Tray # Power Fan 0 Fan 1
-------------------------------
0 on on on
1 on fail on
2 on on on
3 off off off
4 on on on
5 on on on
6 on on on
7 off off off
8 on on on
9 on on on
10 on on on
11 off off off
12 on on on
13 on on on
14 on on on
15 off off off
This problem is caused by an error in the SSP software. When a fan
fails, the Control Board Executive (CBE) tries to restart it by issuing
a vcc_reset every three minutes. The CBE reads incorrect fan tray
status immediately after the vcc_reset, and decides to shut off these
fan trays based on invalid status information.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned problem:
To prevent this problem from occurring, upgrade to SSP 3.5 or install
one of the following patches for Solaris 2.6, 7 and 8.
SSP 3.3: 109175
SSP 3.4: 110412
SSP 3.2: patch is in progress.
No patch is planned for SSP version 3.1.1. Note that these patches only
prevent failed fans from turning off other fans. The defective
fan must still be replaced.
NOTE: After the patch has been installed, the message:
cbe: NOTICE: fan_VccReset: resetting Vcc for all fans
may still occur. This is an indicator of failed fan tray(s) and needs
immediate service attention.
If the problem occurs on an E10000 system which has not been patched,
perform the following workaround.
1) Remove and replace the defective fan
2) Reset all fans with the 'fan -p on' command
COMMENTS:
---------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.