Document fins/I0656-1


FIN #: I0656-1

SYNOPSIS: Over temp limits require patches for better handling on UltraSPARC II
          modules

DATE: Mar/30/00

KEYWORDS: Over temp limits require patches for better handling on UltraSPARC II
          modules


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Patches provide better handling of over temperature limits and 
          danger limit detection for the 400MHz 8M/4M UltraSPARC II modules.

TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  Ultra Enterprise EXX00 Systems  
 
PRODUCT CATEGORY:   Server / SW Admin 

PRODUCTS AFFECTED:  

Systems Affected
----------------
Mkt_ID   Platform   Model   Description            Serial Number	
------   --------   -----   -----------            ------------- 
  -	 E3000       ALL    Ultra Enterprise 3000        -
  -	 E3500       ALL    Ultra Enterprise 3500        -
  -	 E4000       ALL    Ultra Enterprise 4000        -
  -	 E4500       ALL    Ultra Enterprise 4500        -
  -	 E5000       ALL    Ultra Enterprise 5000        -
  -	 E5500       ALL    Ultra Enterprise 5500        -
-        E6000       ALL    Ultra Enterprise 6000        -
-        E6500       ALL    Ultra Enterprise 6500        -

X-Options Affected
------------------
X2244A       -        -     OPT 400MHz CPU w/4MB         -


PART NUMBERS AFFECTED: 

Part Number   Description                             Model
-----------   -----------                             -----
501-5235-04   ASSY Sapphire 8MB 416MHz Module           -
501-5661-07   ASSY Sap-Blk 8MB 400MHz 3.0.5             -
501-5762-03   ASSY Sap-Blk 8MB 400MHz Sony              -
501-4995-03   ASSY Saphire 4MB 400MHz Module            -
501-5239-05   ASSY Saphire 4MB 400/100 Module           -
501-5420-04   ASSY Sap-Blk 4MB 400/100 Mdle w/SHRD      -
501-5425-04   ASSY Saphire 4MB 400MHz 3.0 Module        -
501-5446-04   ASSY Sap-Blk 3.0.5 4MB 400/100 Module     -
501-5500-03   ASSY Saphire 4MB 400MHz Module            -
501-5585-02   ASSY Sap-Blk 4MB 400MHz Rev. A Module     -
501-5838-02   ASSY Sap-Blk 8MB 400MHz 3.0.5 Sombra      -
501-6008-01   ASSY Sap-Blk 8MB 400MHz Sombra 1.1        -


REFERENCES:

BugId:     4304051: POST is not using correct values for determining overtemp

PatchId:   103640: SunOS 5.5.1: kernel, nisopaccess, & libthread patch  
           105181: SunOS 5.6: Kernel update patch  
           106541: SunOS 5.7: Kernel update patch  
           108528: SunOS 5.8: Kernel update patch

URL:       http://infoserver.central/data/sshandbook/Systems/E6500/infodoc.html
           http://infoserver.central/data/sshandbook/Systems/E6500/docs.html

Sun Alert: SA-23978 
   
    
PROBLEM DESCRIPTION: 

Without device driver modifications available in patches, 400MHZ CPU
modules may be subject to overheating when a system fails to offline
its processors or power them down when they become too hot.  This may
lead to reduced reliability of the CPU modules and other components.

The 400MHz CPU modules used in Sunfire systems have operating
temperature specifications which are lower than slower speed modules
(167, 250, 336 MHz).  For this reason, modifications have been made to
the Solaris fhc driver which monitors CPU temperature.  This driver
provides warning and danger messages when a CPU over-temperature
condition is detected.

Sunfire systems are able to monitor CPU temperature using thermistors
which lie beneath each CPU module.  See InfoDoc 18554 for more
information.  Also refer to the docs.sun.com URL above for information
about the CPU Over-temperature Safeguard (COS) software facility.  When
CPU/Memory board temperature rises to a certain level, warning messages
are issued.  If the temperature rises to and stays at a danger level,
an overly hot CPU module will be powered down.  The system can still
operate, provided there are additional online CPU's.  If the
over-temperature CPU is the only online module in the system, the
entire system will be shut down.

This problem can occur with the following configurations:

      Ultra Enterprise Servers 3x00/4x00/5x00/6x00

      with 400MHz 8MB/4MB external cache UltraSPARC II CPU modules

      with Solaris 2.5.1, 2.6, 7, 8.

Run '/usr/sbin/psrinfo -v' to determine CPU module speed.
Run '/bin/uname -r' to see the Solaris release.

Here are examples of console and prtdiag messages after the proper
patch has been applied:

***********  Solaris 2.5.1  ***********

Console messages:
WARNING: CPU/Memory board 0 is warm. Please check system cooling
WARNING: CPU/Memory board 0 is very hot.
WARNING: CPU/Memory board 0 still too hot. Overtemp shutdown started

"/usr/platform/sun4u/sbin/prtdiag -v"  output will look like the
following:

Detected System Faults
======================
Board 0 Overtemp
        Detected Mon May  8 17:49:02 2000

System Temperatures (Celsius):
------------------------------
            Temperature  Trend
            -----------  -----
Board 0:         70      rising
Board 1:         41      stable
Board 3:         41      stable
Control Board:   37      stable

************ Solaris 2.6, 7, 8 *****************

Console Messages:

Mar 30 09:57:54 sunfire4 unix: WARNING: CPU/Memory board 4 is warm
(temperature: 60C). Please check system cooling
Mar 30 10:02:26 sunfire4 unix: WARNING: CPU/Memory board 4 is very hot
(temperature: 68C)
Mar 30 10:02:26 sunfire4 unix: WARNING: System shutdown scheduled in
20 seconds due to over-temperature condition on CPU/Memory board 4
Mar 30 10:02:55 sunfire4 unix: NOTICE: CPU/Memory board 4 is cooling
(temperature: 67C)
Mar 30 10:02:55 sunfire4 unix: NOTICE: System shutdown due to
over-temperature condition cancelled
Mar 30 10:04:09 sunfire4 unix: WARNING: CPU/Memory board 4 is very hot
(temperature: 68C)
Mar 30 10:04:09 sunfire4 unix: WARNING: System shutdown scheduled in
20 seconds due to over-temperature condition on CPU/Memory board 4
Mar 30 10:04:29 sunfire4 unix: WARNING: CPU/Memory board 4 still too hot
(temperature: 68C). Overtemp shutdown started

INIT: New run level: 6
The system is coming down.  Please wait.

"/usr/platform/sun4u/sbin/prtdiag -v" output will look like the
following:

    System Temperatures (Celsius):
    ------------------------------
    Brd   State   Current  Min  Max  Trend
    ---  -------  -------  ---  ---  -----
     1      OK       38     36   39  stable
     2      OK       37     33   38  stable
     4   WARNING     60     26   60  rising
    CLK     OK       36     35   38  stable

****** OR ******

    System Temperatures (Celsius):
    ------------------------------
    Brd   State   Current  Min  Max  Trend
    ---  -------  -------  ---  ---  -----
     1      OK       38     36   39  stable
     2      OK       38     33   39  stable
     4    DANGER     68     26   68  stable
    CLK     OK       36     35   38  stable

This problem is addressed in the following releases:

    Solaris 2.5.1 with patch 103640 or later
    Solaris 2.6   with sysctrl driver patch 105181 or later
    Solaris 7     with kernel patch 106541 or later
    Solaris 8     with fhc driver patch 108528 or later

With the patches installed, warning messages will now appear at 60
degrees C and a powerdown sequence of overheated CPU modules will occur
at a new danger limit setting of 68 degrees C.  These are lower
temperatures than the standard default limits of 73 degrees C (for
warning messages) and 83 degrees C (for danger limit).
                                          

IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        | X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION:

An Authorized Enterprise Field Service Representative should follow the
procedures listed below.

To enable proper temperature monitoring for 400MHz CPU modules on
Sunfire systems, install the following patches:

         -----------------------------------
        |  OS Versions  |  Required Patches |
        |===================================|
        | Solaris 2.5.1 | 103640 or later| 
        | Solaris 2.6   | 105181 or later| 
        | Solaris 7     | 106541 or later| 
        | Solaris 8     | 108528 or later| 
         -----------------------------------
 

COMMENTS:

None  

----------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        


Copyright (c) 1997-2003 Sun Microsystems, Inc.