InfoDoc ID |
|
Synopsis |
|
Date |
21825 |
|
RM6 - Power Supply Failure / Monitoring on A1000/A3000 |
|
15 Nov 2000 |
- Current Default Situation -
A Power Supply failure on A1000/A3000 disk subsystems currently results
in no alarm. The equipment has to be physically inspected periodically
to detect power supply failures in one of the redundant power supplies.
Although these systems have redundant power supplies there is no current
method to notify anyone that one of the supplies have failed. Without
periodic inspections, the failure of a second power supply would be
catastrophic.
Site Configurations that can benefit from this InfoDoc include:
Hardware:
- A3000 attached E3500 X 491(primary configuration)
- A1xxx
- A3xxx
- E5000
- E6500
- E10000
- How to enable Power Supply failure monitoring -
RAID Manager 6.1(RM6) software that controls and monitors A1000/A3000
RAID arrays has a utility that performs a "health check" on the entire
storage enclosure. By creating an hourly cronjob on the RM6 server
an administrator can capture power Supply failure notifications.
An administrator could execute;
"/usr/lib/osa/bin/healthck - a | grep Pwr >>/var/adm/messages"
and have any arrays controlled by RM6 software place any power supply
failures that happened within the hour into the RM6 server's
"/var/adm/messages" file. The format of these messages would be:
"Servername" Drive Tray-Pwr Supp Failure
The "var/adm/messages" file can then be reviewed on a regular basis to
detect when one of the redundant Power Supplies has generated a failure
message. The appropriate repair actions could then be performed on or
during a scheduled maintenance window.
- Additional monitoring capabilities for Sun Remote Services(SRS) Customers -
SRS 1.x currently can search the RM6 server's "/var/adm/messages" file
and can be configured to look for lines containing the string
"Pwr Supp Failure" . It can therefore report this failure with minimum
SRS modification. The only addition to the system would be to add this
previously mentioned cronjob to RM6 server and modify the SRS 1.x
"search string library" to include the string "Pwr Supp Failure". An
Alert Notification of the failure would then be created with the existing
SRS 1.x alert notification methods.
Using the same cronjob, an SRS 2.x administrator could use the "File
Watcher Module" in the Symon software to monitor the "/var/adm/messages"
file on the RM6 server and report any additions to this log to the
Symon software "file changes table". An alarm would be triggered when
a "Pwr Supp Failure" is listed in the "file changes table". An alert
notification would then be sent to the appropriate persons using the
existing SRS 2.x alert mechanism, and without any modifications to
current SRS 2.x technology.
There is also an enhancement to SRS 2.x that is expected to have an
alternative monitoring capability on A1xxx, A3xxx, A5xxx, and Enterprise
Servers' disk subsystems for the "Full Disk monitoring", "Full Interconnect
monitoring", and "Full Enclosure monitoring". By adding a specific
hardware module for the specific platform to the "Config Reader Module",
one can acquire power supply monitoring for all power supplies in the
enterprise enclosure. This would eliminate the requirement for the use
of the cronjob listed previously.
The current production version of SRS is 1.x, and SRS 2.x is scheduled
for release Feb 7th. Existing SRS 1.x site migrations are scheduled to
commence in July at which time all customers will be migrated to SRS 2.x.
- Additional reference information can be found the following manuals -
Sun Management Center 2.1 for Midrange Servers Platforms
Sun Management Center 2.1 for Starfire Enterprise Servers 806-1581-10
Sun Management Center 2.1
User Guide Symon 201 Config
INTERNAL SUMMARY:
Change Record:
Revision:01
Date:02/09/00
Prepared By: STAR Room TSE Team
Reviewed By:Tom Bull
SUBMITTER: Charles Price
APPLIES TO: Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge A1000, Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge A3000, Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge A3500, Storage/RAID Manager, AFO Vertical Team Docs, AFO Vertical Team Docs/Hardware, AFO Vertical Team Docs/Storage
ATTACHMENTS:
Copyright (c) 1997-2003 Sun Microsystems, Inc.