Document fins/I0687-1


FIN #: I0687-1

SYNOPSIS: Failing Fan Problem on E10K

DATE: Jun/28/01

KEYWORDS: Failing Fan Problem on E10K


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Failing fan on Enterprise 10000 server may cause other fans 
          to turn off.              


Sun Alert:          Yes

TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  E10000 Server Fan  
 
PRODUCT CATEGORY:   Server / SW Admin 


PRODUCTS AFFECTED:

Mkt_ID         Platform   Model   Description                Serial Number	
------         --------   -----   -----------                ------------- 
Systems Affected
---------------- 
  -	        E10000     ALL    Ultra Enterprise E10000          -

X-Options Affected
------------------
SSP9S-311-S99N     -        -     SSP 3.1.1                        - 
SSP9S-320-SAM9     -        -     E10000 SSP SW 3.2 CD RELEASE     -
SSP9S-330-SAM9     -        -     E10000 SSP SW 3.3 CD RELEASE     -
SSP9S-340-SAM9     -        -     E10000 SSP SW 3.4 CD RELEASE     -


PART NUMBERS AFFECTED: 

Part Number   Description        Model 
-----------   -----------        -----  
     -             -               -  	
  		

REFERENCES:

BugId:     4405737- A fan tray mechanical trouble can cause any other 
                    fan tray to power off.
 
PatchId:   109175: SSP 3.3: system-board voltages reported in SSP 
                      MIB are inconsistent. 
           110412: SSP 3.4: Eveready fan trays spin fast.

ESC:       529303 
           528965

Sun Alert: SA-26586

MANUAL: 805-2917-15: Sun Enterprise 10000 System Service Manual. 
        805-0310-12: Sun Enterprise 10000 System Overview Manual. 

      
PROBLEM DESCRIPTION: 

Under certain conditions, fan failure can trigger many other fans to
turn off as a result of software bug and cause system overheating.
This FIN highlights importance of not only replacing the defective
fan but also of applying SSP patch.

If a fan tray FRU on an Enterprise 10000 Server (E10K) fails, other fan
trays can potentially be turned off by the SSP software. Fan trays
3,7,11, and 15 are likely to be turned off and remain off until the
defective fan tray is removed from the system. As a result, System
Boards 6-9 are extremely vulnerable to overheating as fans near these
boards may stay off.  E10000 systems with SSP versions 3.1.1 through
3.4 are affected. 

Failure symptoms include fans not spinning and/or possibly the
following message appearing repeatedly every 3-4 minutes in the
/var/opt/SUNWssp/adm/messages file:

     cbe: NOTICE: fan_VccReset: resetting Vcc for all fans

NOTE: the repeated occurrence of this message is an indication of a failed 
fan, and will be present if a fan fails even in patched systems.

Failing fans can also be detected by using 'hostview' or the 'hostinfo
-F' or 'fan' commands. 

  Example:  #fan
	
	    Fan Status
    -------------------------------
    Tray #    Power   Fan 0   Fan 1
    -------------------------------
      0         on     on      on
      1         on   fail      on
      2         on     on      on
      3        off    off     off
      4         on     on      on
      5         on     on      on
      6         on     on      on
      7        off    off     off
      8         on     on      on
      9         on     on      on
     10         on     on      on
     11        off    off     off
     12         on     on      on
     13         on     on      on
     14         on     on      on
     15        off    off     off
 
This problem is caused by an error in the SSP software.  When a fan
fails, the Control Board Executive (CBE) tries to restart it by issuing
a vcc_reset every three minutes.  The CBE reads incorrect fan tray
status immediately after the vcc_reset, and decides to shut off these
fan trays based on invalid status information.
  
  
IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION: 

The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above 
mentioned problem:

To prevent this problem from occurring, upgrade to SSP 3.5 or install 
one of the following patches for Solaris 2.6, 7 and 8.

	SSP 3.3:  109175 
	SSP 3.4:  110412 
	SSP 3.2:  patch is in progress.
	
No patch is planned for SSP version 3.1.1. Note that these patches only 
prevent failed fans from turning off other fans.  The defective 
fan must still be replaced.
		
NOTE: After the patch has been installed, the message:

     cbe: NOTICE: fan_VccReset: resetting Vcc for all fans

may still occur. This is an indicator of failed fan tray(s) and needs 
immediate service attention. 

If the problem occurs on an E10000 system which has not been patched,
perform the following workaround.

  	1) Remove and replace the defective fan 
  	2) Reset all fans with the 'fan -p on' command


COMMENTS:  


---------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        



Copyright (c) 1997-2003 Sun Microsystems, Inc.