Document fins/I0684-1


FIN #: I0684-1

SYNOPSIS: RM 6.22 'healthck' utility reporting problem on A3000/A3500 arrays

DATE: Jun/21/01

KEYWORDS: RM 6.22 'healthck' utility reporting problem on A3000/A3500 arrays


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: RM6.22 'healthck' utility may not report power supply or fan  
          failures in StorEdge A3000/A3500 arrays, which may result in 
          loss of availability.


SunAlert:           No

TOP FIN/FCO REPORT: No 
 
PRODUCT_REFERENCE:  Raid Manager 6.22  
 
PRODUCT CATEGORY:   Storage / Service 


PRODUCTS AFFECTED:  
 
Mkt_ID   Platform   Model     Description                 Serial Number
------   --------   -----     -----------                 -------------
Systems Affected
----------------

  -      ANYSYS       -       System Platform Independent       -
  
X-Options Affected
------------------
  -              A3000   -    A3000 StroEdge Array              -
  -              A3500   -    A3500 StorEdge Array              -
X6530A             -     -    RSM 2000 15x4.2Gb/5400 FWSCSI     -
X6531A             -     -    RSM 2000 35x4.2Gb/5400 FWSCSI     -
X6532A             -     -    RSM 2000 15x4.2Gb/7200 FWSCSI     -
X6533A             -     -    RSM 2000 35x4.2Gb/7200 FWSCSI     -
X6534A             -     -    RSM 2000 15x9.1Gb/7200 FWSCSI     -
X6535A             -     -    RSM 2000 35x9.1Gb/7200 FWSCSI     -
6534A              -     -    A3000 15*9.1GB/7200 FWSCSI        -       
6535A              -     -    A3000 35*9.1GB/7200 FWSCSI        - 
SG-ARY351A-180G    -     -    A3500 1 CONT MOD./5 TRAYS/18GB    -   
SG-ARY353A-360G    -     -    A3500 2 CONT/7 TRAYS/18GB         -  
SG-ARY360A-90G     -     -    A3500 1 CONT/5 TRAYS/9GB(10K)     -  
SG-ARY362A-180G    -     -    A3500 2 CONT/7 TRAYS/9GB(10K)     -
SG-ARY366A-72G     -     -    A3500 1 CONT/2 TRAYS/9GB(10K)     -  
SG-ARY366A-72GR5   -     -    A3500 1 CONT/2 TRAYS/9GB(10K)     -  
SG-ARY370A-91G     -     -    91-GB A3500 (1x5x9-GB)            -  
SG-ARY372A-182G    -     -    182-GB A3500 (2x7x9-GB)           - 
SG-ARY374A-273G    -     -    273-GB A3500 w/(3x15x9-GB)        -
SG-ARY380A-182G    -     -    182-GB A3500 (1x5x18-GB)          - 
SG-ARY382A-364G    -     -    364-GB A3500 (2x7x18-GB)          - 
SG-ARY384A-546G    -     -    546-GB A3500 (3x15x18-GB)         - 
SG-XARY351A-180G   -     -    A3500 1 CONT MOD/5 TRAYS/18GB     - 
SG-XARY353A-1008G  -     -    A3500 2 CONT/7 TRAYS/18GB         -
SG-XARY353A-360G   -     -    A3500 2 CONT/7 TRAYS/18GB         -
SG-XARY355A-2160G  -     -    A3500 3 CONT/15 TRAYS/18GB        -
SG-XARY360A-545G   -     -    545-GB A3500 (1X5X9-GB)           -
SG-XARY360A-90G    -     -    A3500 1 CONT/5 TRAYS/9GB(10K)     - 
SG-XARY362A-180G   -     -    A3500 2 CONT/7 TRAYS/9GB(10K)     -
SG-XARY362A-763G   -     -    A3500 2 CONT/7 TRAYS/9GB(10K)     -
SG-XARY364A-1635G  -     -    A3500 3 CONT/15 TRAYS/9GB(10K)    -
SG-XARY366A-72G    -     -    A3500 1 CONT/2 TRAYS/9GB(10K)     - 
SG-XARY380A-1092G  -     -    A3500 1092-GB (1x5x18-GB)         -  

 
PART NUMBERS AFFECTED: 

Part Number   Description                          Model
-----------   -----------                          -----

704-6708-10   CD SUN StorEdge RAID Manager 6.22      -


REFERENCES:

BugId:   4343416 - RM6 gui and healthck are not reporting controller fan 
                   failure. 
         4332283 - RM6 reports faults at the Power Supply Unit differently
                   between RAID modules.
         4402798 - healthck doesn't report a power failure if there's no 
                   I/O to the device.

MANUAL:  805-7758-12: Sun StorEdge A1000, A3x00, and A3500FC Subsystems. 
         805-7756-10: Sun StorEdge RM6.22 Installation and Support Guide
                      for Solaris. 
         805-4980-10: Sun StorEdge A3000 Controller Module Guide. 
         805-4980-11: Sun StorEdge A3500/A3500FC Controller Module Guide. 
         806-6419-11: Sun StorEdge A3500/A3500FC Best Practices Guide.  

      
PROBLEM DESCRIPTION: 

Raid Manager 6.22 (RM6.22) may fail to report a power supply or fan
failure for A3000/A3500 arrays.  This may expose the array to a single
point of failure which can lead to loss of availability.  The
'healthck' utility from RM6.22 does not correctly report these
failures.  Field Service personnel running 'healthck' may assume that
this utility reports all failed components, but this is not always the
case.  Approximately 12,000 A3000 and A3500 units in 1x5, 2x7 and 3x15
configurations have been shipped since January of 1998 and these units
may be affected by this problem. 

The following RM6.22 Bug ID's describe the potential problems:
 
1. BugId #4343416  RM6 GUI and 'healthck' are not reporting controller 
   fan failure.  RM6.22 does not report controller fan failure or power 
   fan failure.  The problem can be reproduced by running 'healthck' after 
   pulling out the controller fan or power supply fan.  When the fan is
   kept out for 10 minutes, RAID modules report an optimal fan
   status. 
              
   *Note: The failure is being reported in the /var/adm/messages, but 
          is not being reported by 'healthck'.
          
2. Bugid #4332283 RM6 reports faults at the Power Supply Unit differently
   between RAID modules.  While running fault insertion tests and pulling
   a power cable from the back of the power supply unit, RM6 reported this 
   as a failed fan on both units.  When turning off the power supply on both 
   RAID modules, RM6 reported one as a failed fan and one as a failed power 
   supply.
          
   *Note: The failure is being reported correctly in the /var/adm/messages,
          but is being reported incorrectly by RM6 'healthck'.
  
3. Bugid #4402798  'healthck' doesn't report a power failure if there's 
   no I/O to the device.  The 'healthck' and recovery guru fails to detect 
   either partial or full power failure unless there is I/O to the array. 

The above mentioned problems and Bug ID's are caused by incorrect
"Subsystem Fault region" values in the controller NVSRAM.  The
controller fails to return component status for components in the
actual controller enclosure.  The error is not being reported by the
RM6 'healthck' and recovery guru.

These problems may occur on the A3000 and A3500 (SCSI) arrays with the
following NVSRAM files: 
          
              SIE3621E.DL <- for A3000 with RSM trays
              SIE3621F.DL <- for A3500 with D1000 trays
         
A1000 and A3500FC arrays have correct NVSRAM files and are not affected.
 
The Raid Manager 6.22.1, next updated version to RM 6.22 will be
released with a new NVSRAM file that contains the correct "Subsystem
Fault Region" values to fix these problems.
 

IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION: 

An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.

The Raid Manager 6.22.1, next updated version will be released with
a new NVSRAM file that contains the correct "Subsystem Fault Region"
values to fix the above mentioned problems. 

For the current version of RM6.22 with StorEdge A3000 and A3500(SCSI)
arrays, field personnel should check the /var/adm/messages file
regularly or visually monitor the power supply and the fan LEDs to
determine whether there is a power supply or fan failure.

See the following URL for a description of A3000 Array LED status codes:

http://infoserver.central/data/sshandbook/Systems/common-docs/rsm2000leds.ps

         
COMMENTS:  
    
------------------------------------------------------------------------------ 


Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        



Copyright (c) 1997-2003 Sun Microsystems, Inc.