Document fins/I0417-1


FIN #: I0417-1

SYNOPSIS: Powering off Alt. Control Brd. in a UE10000 May Result in Total
          Platform Shutdown.

DATE: Aug/26/98 

KEYWORDS: Powering off Alt. Control Brd. in a UE10000 May Result in Total
          Platform Shutdown.


---------------------------------------------------------------------
        - Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                        FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS:      Powering off Inactive Alternate Control Board in 
               a UE10000 May Result in Total Platform Shutdown.

            
TOP FIN/FCO REPORT: Yes 

 
PRODUCT_REFERENCE:  Ultra Enterprise 10000 Secondary Control Board  
                    
                                                              
PRODUCT CATEGORY:   Server /  System Board


PRODUCTS AFFECTED: 

  
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------

Systems Affected
----------------
  -      E10000     All     Ultra Enterprise 10000 Server   -
          

X-Options Affected
------------------

X2720A      -        -      E10000 Control Board with Ethernet Hub  -


PART NUMBERS AFFECTED: 

Part Number   Description   Model
-----------   -----------   -----

    -             -          -


REFERENCES: 


BugId:	 4149225,  4135766  
PatchId: 105683
MANUAL:	 805-2917-14 Sun Enterprise 10000 System Service Manual
	

       	
PROBLEM DESCRIPTION: 


Ultra Enterprise 10000 systems with an (unused) alternate Control Board  
and the Event Detector Daemon (EDD) enabled may be susceptible to total
platform shutdown if the alternate control board is powered off.

Shown below is an example of a warning from the platform messages file
of an E10000 that encountered an alternate control board power down
with the EDD enabled:

procestemp: Warning: The Temperature has exceed 911 temp on control board 0
procestemp: Temperature data for board 0 : cbStarfire5VDCTemp.0 226.88 C,
       cbStarfire5VDCPerTemp.0 226.88 C,  cbStarfire5VDCFanTemp.0 226.88 C,
poweroff:   Shutting down entire system...
procesvolt: Warning: Voltage readings have exceeded the thresholds on
            control board 0
procesvolt: Voltage data for board 0 : cbStarfire3p3VDCHK.0 5.00 V,
            cbStarfire5VDC.0 10.03 V,  cbStarfire5VDCHK.0 10.03 V,
            cbStarfire5VDCPer.0 10.03 V,  cbStarfire5VDCFan.0 10.03 V,

The example warning message (above) is followed by a total platform shut  
down because of false temperature detection which immediately trips the 
breakers.

The delay between the time that the Control Board was powered off and the  
breakers tripped was about 2 minutes.

A minor problem on UE10000 Control Boards causes some to be latched in
reset when powered down (correct), while other control boards are not 
latched (incorrect).  Control Boards that are not latched in reset when 
powered down, actually return valid JBC CIDs, which the ssp monitoring 
software (via EDD) currently uses to validate temperatures measured onboard.  
However, since main power is off for that control board, the temperatures 
are actually invalid, but the ssp software believes them to be valid because
the JBC CID can be read.  In this scenario, the bogus temperatures exceed
the "911 temp" values and the platform is systematically shutdown.


IMPLEMENTATION:
 
         ---
    	|   |  	MANDATORY (Fully Pro-Active)
  	 ---    
  	 
  
         ---
    	|   | 	CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
  	 --- 
  	 
  	   			
         ---
    	| X | 	REACTIVE (As Required)
  	 ---

	
CORRECTIVE ACTION: 


Enterprise Services field personnel can avoid the above stated problem by
following either the workaround or the recommendation shown below:

1. Prior to powering down the alternate control board of an E10000 it   
   will be necessary to disable the Event Detector Daemon (EDD)   
       
    As user 'ssp' on the SSP, execute the following command:

	            ssp% edd_cmd -x stop


a. Once the alternate control board has been removed or replaced, it
   will then be necessary to turn EDD back ON (enable).

   As user 'ssp' on the SSP, execute the following command:

		    ssp% edd_cmd -x start
		
			
*Failure to do so may cause the entire platform to be shut down because 
 of the false temperatures that are read on the powered-down control board.
 	
 

2. Install Patch ID# 105683 or higher, which fixes the problem by qualifying
   temperatures on a powered down control board by further examining the 
   power ring voltages.

 
*Be sure to follow the 'Special Install Instructions' of the patch README.
                        -----------------------------

  
COMMENTS: 

NOTE: For complete instructions on Control Board Replacement reference:
       The Sun Enterprise 10000 System Service Manual p# 805-2917-14

      Chapter 2: Component Replacement Procedures,  
	         Section 2.9 Control Board Replacement, 
	         Subsection  2.9.3 Powering Off a Control Board
	         page 2-26  
		
--------------------------------------------------------------------------

Implementation Footnote:
________________________

i)   In case of MANDATORY FINs, Enterprise Services will attempt to contact   
     all affected customers to recommend implementation of the FIN. 
        

ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical sup-  

     port teams will recommend implementation of the FIN (to their respective 
     accounts), at the convenience of the customer. 


iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the need 

     arises.
--------------------------------------------------------------------------

All released FINs and FCOs can be accessed using your favorite network 
browser as follows:

SunWeb Access: 
______________

* Access the top level URL of http://cte.corp/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.

Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
accessed internally at the following URL: http://edist.corp/.
 
* From there, follow the hyperlink path of "SunService Documentation"
and 
  click on "FIN & FCO attachments", then choose the appropriate
folder, FIN or 
  FCO.  This will display supporting directories/files for FINs or FCOs.
  
Internet Access:
_______________

* Access the top level URL of https://infoserver.Sun.COM

--------------------------------------------------------------------------
General:
________

Send questions or comments to finfco-manager@cte.Corp

---------------------------------------------------------------------------




Copyright (c) 1997-2003 Sun Microsystems, Inc.