Document fins/I0684-1
FIN #: I0684-1
SYNOPSIS: RM 6.22 'healthck' utility reporting problem on A3000/A3500 arrays
DATE: Jun/21/01
KEYWORDS: RM 6.22 'healthck' utility reporting problem on A3000/A3500 arrays
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: RM6.22 'healthck' utility may not report power supply or fan
failures in StorEdge A3000/A3500 arrays, which may result in
loss of availability.
SunAlert: No
TOP FIN/FCO REPORT: No
PRODUCT_REFERENCE: Raid Manager 6.22
PRODUCT CATEGORY: Storage / Service
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- ANYSYS - System Platform Independent -
X-Options Affected
------------------
- A3000 - A3000 StroEdge Array -
- A3500 - A3500 StorEdge Array -
X6530A - - RSM 2000 15x4.2Gb/5400 FWSCSI -
X6531A - - RSM 2000 35x4.2Gb/5400 FWSCSI -
X6532A - - RSM 2000 15x4.2Gb/7200 FWSCSI -
X6533A - - RSM 2000 35x4.2Gb/7200 FWSCSI -
X6534A - - RSM 2000 15x9.1Gb/7200 FWSCSI -
X6535A - - RSM 2000 35x9.1Gb/7200 FWSCSI -
6534A - - A3000 15*9.1GB/7200 FWSCSI -
6535A - - A3000 35*9.1GB/7200 FWSCSI -
SG-ARY351A-180G - - A3500 1 CONT MOD./5 TRAYS/18GB -
SG-ARY353A-360G - - A3500 2 CONT/7 TRAYS/18GB -
SG-ARY360A-90G - - A3500 1 CONT/5 TRAYS/9GB(10K) -
SG-ARY362A-180G - - A3500 2 CONT/7 TRAYS/9GB(10K) -
SG-ARY366A-72G - - A3500 1 CONT/2 TRAYS/9GB(10K) -
SG-ARY366A-72GR5 - - A3500 1 CONT/2 TRAYS/9GB(10K) -
SG-ARY370A-91G - - 91-GB A3500 (1x5x9-GB) -
SG-ARY372A-182G - - 182-GB A3500 (2x7x9-GB) -
SG-ARY374A-273G - - 273-GB A3500 w/(3x15x9-GB) -
SG-ARY380A-182G - - 182-GB A3500 (1x5x18-GB) -
SG-ARY382A-364G - - 364-GB A3500 (2x7x18-GB) -
SG-ARY384A-546G - - 546-GB A3500 (3x15x18-GB) -
SG-XARY351A-180G - - A3500 1 CONT MOD/5 TRAYS/18GB -
SG-XARY353A-1008G - - A3500 2 CONT/7 TRAYS/18GB -
SG-XARY353A-360G - - A3500 2 CONT/7 TRAYS/18GB -
SG-XARY355A-2160G - - A3500 3 CONT/15 TRAYS/18GB -
SG-XARY360A-545G - - 545-GB A3500 (1X5X9-GB) -
SG-XARY360A-90G - - A3500 1 CONT/5 TRAYS/9GB(10K) -
SG-XARY362A-180G - - A3500 2 CONT/7 TRAYS/9GB(10K) -
SG-XARY362A-763G - - A3500 2 CONT/7 TRAYS/9GB(10K) -
SG-XARY364A-1635G - - A3500 3 CONT/15 TRAYS/9GB(10K) -
SG-XARY366A-72G - - A3500 1 CONT/2 TRAYS/9GB(10K) -
SG-XARY380A-1092G - - A3500 1092-GB (1x5x18-GB) -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
704-6708-10 CD SUN StorEdge RAID Manager 6.22 -
REFERENCES:
BugId: 4343416 - RM6 gui and healthck are not reporting controller fan
failure.
4332283 - RM6 reports faults at the Power Supply Unit differently
between RAID modules.
4402798 - healthck doesn't report a power failure if there's no
I/O to the device.
MANUAL: 805-7758-12: Sun StorEdge A1000, A3x00, and A3500FC Subsystems.
805-7756-10: Sun StorEdge RM6.22 Installation and Support Guide
for Solaris.
805-4980-10: Sun StorEdge A3000 Controller Module Guide.
805-4980-11: Sun StorEdge A3500/A3500FC Controller Module Guide.
806-6419-11: Sun StorEdge A3500/A3500FC Best Practices Guide.
PROBLEM DESCRIPTION:
Raid Manager 6.22 (RM6.22) may fail to report a power supply or fan
failure for A3000/A3500 arrays. This may expose the array to a single
point of failure which can lead to loss of availability. The
'healthck' utility from RM6.22 does not correctly report these
failures. Field Service personnel running 'healthck' may assume that
this utility reports all failed components, but this is not always the
case. Approximately 12,000 A3000 and A3500 units in 1x5, 2x7 and 3x15
configurations have been shipped since January of 1998 and these units
may be affected by this problem.
The following RM6.22 Bug ID's describe the potential problems:
1. BugId #4343416 RM6 GUI and 'healthck' are not reporting controller
fan failure. RM6.22 does not report controller fan failure or power
fan failure. The problem can be reproduced by running 'healthck' after
pulling out the controller fan or power supply fan. When the fan is
kept out for 10 minutes, RAID modules report an optimal fan
status.
*Note: The failure is being reported in the /var/adm/messages, but
is not being reported by 'healthck'.
2. Bugid #4332283 RM6 reports faults at the Power Supply Unit differently
between RAID modules. While running fault insertion tests and pulling
a power cable from the back of the power supply unit, RM6 reported this
as a failed fan on both units. When turning off the power supply on both
RAID modules, RM6 reported one as a failed fan and one as a failed power
supply.
*Note: The failure is being reported correctly in the /var/adm/messages,
but is being reported incorrectly by RM6 'healthck'.
3. Bugid #4402798 'healthck' doesn't report a power failure if there's
no I/O to the device. The 'healthck' and recovery guru fails to detect
either partial or full power failure unless there is I/O to the array.
The above mentioned problems and Bug ID's are caused by incorrect
"Subsystem Fault region" values in the controller NVSRAM. The
controller fails to return component status for components in the
actual controller enclosure. The error is not being reported by the
RM6 'healthck' and recovery guru.
These problems may occur on the A3000 and A3500 (SCSI) arrays with the
following NVSRAM files:
SIE3621E.DL <- for A3000 with RSM trays
SIE3621F.DL <- for A3500 with D1000 trays
A1000 and A3500FC arrays have correct NVSRAM files and are not affected.
The Raid Manager 6.22.1, next updated version to RM 6.22 will be
released with a new NVSRAM file that contains the correct "Subsystem
Fault Region" values to fix these problems.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
CORRECTIVE ACTION:
An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.
The Raid Manager 6.22.1, next updated version will be released with
a new NVSRAM file that contains the correct "Subsystem Fault Region"
values to fix the above mentioned problems.
For the current version of RM6.22 with StorEdge A3000 and A3500(SCSI)
arrays, field personnel should check the /var/adm/messages file
regularly or visually monitor the power supply and the fan LEDs to
determine whether there is a power supply or fan failure.
See the following URL for a description of A3000 Array LED status codes:
http://infoserver.central/data/sshandbook/Systems/common-docs/rsm2000leds.ps
COMMENTS:
------------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.