Document fins/I0753-1


FIN #: I0753-1

SYNOPSIS: Sun Fire 15K installations running SMS 1.1 may experience deadlock
          condition

DATE: Jan/02/02

KEYWORDS: Sun Fire 15K installations running SMS 1.1 may experience deadlock
          condition


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Sun Fire 15K installations running SMS 1.1 may experience
          deadlock condition.


SunAlert:           No              

TOP FIN/FCO REPORT: No 
 
PRODUCT_REFERENCE:  Sun Fire 15000 Server  
 
PRODUCT CATEGORY:   Server / Service

  
PRODUCTS AFFECTED:  
  
Mkt_ID   Platform   Model   Description            Serial Number
------   --------   -----   -----------            -------------
  -        F15K      ALL    Sun Fire 15K Server          -


X-Options Affected
------------------
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------
  -         -         -          -              -


PART NUMBERS AFFECTED:

Part Number      Description                          Model
-----------      --------------------                 -----
501-5121-10      System Control Board, Starcat          -


REFERENCES:

BugId: 4504754 - tmd throttle should be 1 until all lock issues are 
                 resolved.

      
PROBLEM DESCRIPTION:

Sun Fire 15K installations running System Management Software 1.1 (SMS)
might not complete HPOST and will not boot Solaris if the
'setkeyswitch' command is run at the same time on multiple domains.
This may lead to a power recovery or a global DStop where recovery of
all domains is attempted. 

Due to locking issues between SMS 1.1 and HPOST, running multiple
simultaneous setkeyswitch operations could result in a deadlock
situation, causing the setkeyswitch/HPOST process will hang.  The
operation eventually times out, at which time the setkeyswitch/HPOST
operation fails.  Solaris will not boot up on the affected domains.

The problem is seen when multiple setkeyswitch commands are run in
parallel via 'setkeyswitch on' for one domain and then immediately
running it on other domains before the first domain completes HPOST.
A possible scenario would be multiple administrators running 
'setkeyswitch on' for different domains at the same time.  This
results in more than one 'setkeyswitch' process running concurrently
which can be seen using 'ps -ef' from the SMS console.

A typical hang condition would look like:

     % setkeyswitch -d a on
     ...<bunch of POST messages>...
	___          <--- POST messages cease, don't return to prompt
	
This deadlock condition occurs because HPOST fails to follow the proper
lock ordering policy.  It could occur with any command that invokes
HPOST such as setkeyswitch, addboard, moveboard, rcfgadm or cfgadm.

SMS version 1.2 fixes this deadlock situation and is now in testing.
Until this new version is released, please follow the workaround 
recommendation given below.
 
  
IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION:

An Authorized Enterprise Services Field Representative may avoid the
above mentioned problems by following the recommendations as shown
below.

Please adhere to the following guidelines:    

Ensure that setkeyswitch operations are serialized and not run in
parallel.  This can be done by modifying the throttle value for
the 'tmd' daemon in the ssd_start file. 

  Change the 2nd field of the tmd entry in 

   /etc/opt/SUNWSMS/startup/ssd_start from '-t 4' to '-t 1'. 
   
   The final entry reads:

         tmd:-t 1:0:1:0:1:4:4:sms-tmd

Then, as root, stop and start SMS for the changes to take effect.

	 # /etc/init.d/sms stop
	 # /etc/init.d/sms start


With this workaround, setkeyswitch operations can be issued in
parallel, but SMS internally serializes the operations.  This means
that all but the first parallel setkeyswitch operation will appear to
hang until the first operation completes.  The operations are in fact
not hanging, but awaiting their turn to execute.

There is a penalty to this serialization.  Testing has revealed that
booting 15 domains at the same time takes twice as long when the above 
workaround is implemented.


COMMENTS:  

None.

=========================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission
critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as
the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO
index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services
Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files
for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
---------------------------------------------------------------------------



Copyright (c) 1997-2003 Sun Microsystems, Inc.