Document fins/I0674-1
FIN #: I0674-1
SYNOPSIS: Confusing syslog error messages may cause to replace good T3 Power
Cooling Unit)
DATE: May/10/01
KEYWORDS: Confusing syslog error messages may cause to replace good T3 Power
Cooling Unit)
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Good StorEdge T3 Power Cooling Units are being unnecessarily
replaced due to confusing syslog error messages.
TOP FIN/FCO REPORT: No
PRODUCT_REFERENCE: Power Cooling Units
PRODUCT CATEGORY: Storage / Service
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- ANYSYS ALL System Platform Independent -
X-Options Affected
------------------
- T3 ALL T3 StorEdge Array -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
300-1453-01 325 Watt Power Supply w NiCd Battery (pre-release) -
300-1454-01 Tectrol TC64S-1327 325 Watt Power Supply -
REFERENCES:
ESC: 529253
Manual: 806-1062-11 - Sun StorEdge T3 Disk Tray Installation, Operation
and Service Manual.
806-1063-11 - Sun StorEdge T3 Disk Tray Administrator's Guide.
PROBLEM DESCRIPTION:
Customers and field personnel may misinterpret certain StorEdge T3
syslog messages which report T3 battery problems. This can result in
Power and Cooling Units (PCU) being replaced when they are not bad.
This causes unnecessary expense and possible down time for a customer's
system.
Network Storage Engineering are working on new firmware which will
correct these confusing syslog messages, however this is work in
process. Until the new FW is completed, this FIN is notification to
the field regarding syslogs messages which may be confusing and may be
misinterpreted.
Messages in the T3 syslog file are reported as one of four types:
Error - Critical system event requiring immediate attention
Warning - Possible event requiring eventual user intervention
Notice - Event which may be a side effect of other events or
may be a normal condition
Information - Event has no consequence on the health of the system
Some syslog messages can be misleading. For example, attending to an
'N' (Notice) message may result in an unnecessary maintenance
action. This FIN includes examples of normal 'N' Notice messages and
normal "W" Warning messages which can be ignored, and also examples of
Warning messages which should be attended to.
There are two LEDs located on the exterior bezel panel on each PCU.
The leftmost LED is the AC LED and the rightmost LED is the PS status
LED. Both LEDs are bi-color (green and amber). The following refers
to the rightmost LED, the PS status LED.
'Battery not OK' Notice with a flashing green PS LED
indicates a normal recharge cycle; this is OK.
'Refreshing battery' Notice with the amber PS LED on indicates
a normal refresh cycle; this is OK.
The following is an example of a normal battery refresh cycle.
Hold times of greater than 360 seconds indicate a good battery.
However, the Notice "Battery not OK" might lead some
customers to believe otherwise. This message appears during
all normal refresh cycles, and is not cause for alarm or action.
See below.
Mar 20 08:00:56 BATD[1]: N: Battery Refreshing cycle starts from this point.
Mar 20 08:00:58 LPCT[1]: N: u1pcu1: Refreshing battery
Mar 20 08:01:01 LPCT[1]: N: u2pcu1: Refreshing battery
Mar 20 08:11:46 LPCT[1]: N: u1pcu1: Battery not OK <---
Mar 20 08:11:49 LPCT[1]: N: u1pcu1: Battery not OK <---
Mar 20 08:11:52 BATD[1]: N: u1pcu1: hold time was 655 seconds. <---
Mar 20 08:22:37 LPCT[1]: N: u2pcu1: Battery not OK <---
Mar 20 08:22:40 BATD[1]: N: u2pcu1: hold time was 1301 seconds. <---
Mar 20 19:50:13 LPCT[1]: N: u1pcu2: Refreshing battery
Mar 20 19:50:16 LPCT[1]: N: u2pcu2: Refreshing battery
Mar 20 20:01:43 LPCT[1]: N: u1pcu2: Battery not OK <---
Mar 20 20:01:46 BATD[1]: N: u1pcu2: hold time was 695 seconds. <---
Mar 20 20:11:37 LPCT[1]: N: u2pcu2: Battery not OK <---
Mar 20 20:11:40 LPCT[1]: N: u2pcu2: Battery not OK <---
Mar 20 20:11:43 BATD[1]: N: u2pcu2: hold time was 1289 seconds. <---
Mar 21 07:37:37 BATD[1]: N: Battery Refreshing cycle ends at this point.
In the above example hold times are greater than 360 seconds, meaning
that the batteries are taking a long time before they are discharging
to a minimum level. This indicates healthy batteries.
The Notice 'N: u1pcu1: Battery not OK' occurs during the refresh
cycle. The Notice indicates the end of the discharge, and a
notification message stating 'N: Battery Refreshing cycle ends at
this point' will be posted at the end of the refresh cycle. This is
a normal refresh sequence.
Additional examples of syslog messages that can be misleading are
appended to the bottom of this section.
. The first example highlights misleading error messages that
are a result of a date command (tzset) within a refresh cycle.
This generates a false warning message.
See Example A.
. The second example highlights misleading error messages that
are the result of a unit being switched off during a refresh
cycle. This generates the "hold time low" warning in error.
See Example B.
Syslog Examples:
Example A. This example highlights misleading error messages that
are a result of a date command (tzset) within a refresh cycle.
This generates a false warning message.
Sep 14 22:08:53 BATD[1]: N: Battery Refreshing cycle starts from this point.
Sep 14 22:08:56 LPCT[1]: N: u1pcu1: Refreshing battery
Sep 14 22:08:59 LPCT[1]: N: u2pcu1: Refreshing battery
Sep 14 22:20:20 LPCT[1]: N: u1pcu1: Battery not OK
Sep 14 22:20:22 BATD[1]: N: u1pcu1: hold time was 689 seconds.
Sep 14 22:22:59 LPCT[1]: N: u2pcu1: Battery not OK
Sep 14 22:23:02 BATD[1]: N: u2pcu1: hold time was 845 seconds.
Sep 15 01:51:26 sh13[1]: N: tzset -0800 <--------
Sep 15 01:51:27 sh13[1]: N: date 200009141758.38
Sep 14 17:58:38 BATD[1]: W: u1pcu1: Replace battery, hold time
low.<--------
Sep 14 17:58:38 BATD[1]: W: u2pcu1: Replace battery, hold time
low.<--------
Sep 14 17:58:38 BATD[1]: N: u1pcu1 Battery took too long to recharge.
Sep 14 17:58:38 BATD[1]: N: u2pcu1 Battery took too long to recharge.
Sep 14 17:58:38 BATD[1]: N: Battery Refreshing cycle ends at this point.
Example B.This example highlights misleading error messages that
are the result of a unit being switched off during a refresh.
This generates the "hold time low" warning in error.
Jan 18 17:42:27 BATD[2]: N: Battery Refreshing cycle starts from this point.
Jan 18 17:42:30 LPCT[2]: N: u1pcu1: Refreshing battery
Jan 18 17:42:33 LPCT[2]: N: u2pcu1: Refreshing battery
Jan 18 17:51:12 LPCT[2]: N: u1pcu1: Battery not OK
Jan 18 17:51:15 BATD[2]: N: u1pcu1: hold time was 527 seconds.
Jan 18 17:51:26 LPCT[2]: N: u2pcu1: Battery not OK
Jan 18 17:51:29 LPCT[2]: N: u2pcu1: Battery not OK
Jan 18 17:51:32 BATD[2]: N: u2pcu1: hold time was 542 seconds.
Jan 18 18:04:16 sh73[2]: N: fru stat
Jan 18 18:04:28 sh73[2]: N: id read u1pcu1
Jan 18 18:04:31 sh73[2]: N: id read u1pcu2
Jan 18 18:04:37 sh73[2]: N: id read u2pcu1
Jan 18 18:04:41 sh73[2]: N: id read u2pcu2
Jan 18 18:04:46 sh73[2]: N: refresh -s
Jan 18 18:04:48 sh73[2]: N: vol stat v0
Jan 18 18:04:49 sh73[2]: N: vol stat v1
Jan 19 04:47:32 LPCT[2]: W: u1pcu1: On battery
Jan 19 04:47:36 LPCT[2]: W: u1pcu1: Switch off <--------
Jan 19 04:47:36 LPCT[2]: W: u1pcu1: Off
Jan 19 04:47:36 LPCT[2]: E: u1pcu1: Battery not present
Jan 19 04:47:39 LPCT[2]: W: u1pcu1: DC not OK
Jan 19 04:47:41 LPCT[2]: N: u1pcu1: Battery not OK
Jan 19 04:48:09 LPCT[2]: W: u2pcu1: On battery
Jan 19 04:47:41 LPCT[2]: N: u1pcu1: Battery not OK
Jan 19 04:48:09 LPCT[2]: W: u2pcu1: On battery
Jan 19 04:48:12 LPCT[2]: W: u2pcu1: Off
Jan 19 04:48:12 LPCT[2]: W: u2pcu1: Disabled
Jan 19 04:48:12 LPCT[2]: W: u2pcu1: Disabled
Jan 19 08:51:14 BATD[2]: W: u1pcu1: Replace battery, hold time
low.<--------
Jan 19 08:51:31 BATD[2]: W: u2pcu1: Replace battery, hold time
low.<--------
Jan 19 08:51:31 BATD[2]: N: u1pcu1 Battery took too long to recharge.
Jan 19 08:51:31 BATD[2]: N: u2pcu1 Battery took too long to recharge.
Jan 19 08:51:31 BATD[2]: N: u2pcu2:skips battery refresh because the other PCU
u2pcu1 : Power Supply not running
Jan 19 08:51:32 BATD[2]: N: Battery Refreshing cycle ends at this point.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
CORRECTIVE ACTION:
An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.
If your customer sees Warning (W) or Error (E) messages in the syslog
file indicating that the batteries and power supply should be replaced,
do so as soon as possible. Replacing the power supply when prompted to
do so via error messages will protect the customer from experiencing an
outage situation. However, do not replace a battery (PCU) when the hold
time reported during a refresh cycle is greater than 360 seconds. Also be
aware that if the refresh cycle is interrupted by loss of power or by a
tzset command, syslog may report false error messages. In this case, do not
replace the battery.
In the example below, u1pcu1 should be replaced because the hold time
was less than 360 seconds.
Mar 20 08:00:56 BATD[1]: N: u1pcu1: hold time was 20 seconds.
Mar 20 08:00:58 BATD[1]: W: u1pcu1: Replace battery, hold time low. <---
Mar 20 08:01:01 BATD[1]: N: u1pcu2: hold time was 695 seconds.
Mar 20 08:11:46 BATD[1]: N: u1pcu2: skips battery refresh because the
PCU u1pcu1 : PCU1 hold time low
You may also use the 'id read' command from the T3 prompt to check
battery status:
newton62:/:<17>.id read u1pcu1
Revision : 0000
Manufacture Week : 00421999
Battery Install Week : 00122001
Battery Life Used : 0 days, 15 hours
Battery Life Span : 730 days, 12 hours <-------
Serial Number : 005447
Battery Warranty Date: 20010322172349
Battery Internal Flag: 0x00000000 <------
Vendor ID : TECTROL-CAN
Model ID : 300-1454-01(50)
In the example above, 'Battery Life Span' is greater than 45 days,
and the 'Battery Internal Flag' is not showing any 1's. Assuming
no Warnings or Errors have been issued, this would indicate a
good battery/PCU.
The next scheduled release of T3 firmware will change the wording of
the syslog messages. The changes will be a rewrite of the
'Battery not OK' syslog messages.
COMMENTS:
----------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.