Document fins/I0674-1


FIN #: I0674-1

SYNOPSIS: Confusing syslog error messages may cause to replace good T3 Power
          Cooling Unit)

DATE: May/10/01

KEYWORDS: Confusing syslog error messages may cause to replace good T3 Power
          Cooling Unit)


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Good StorEdge T3 Power Cooling Units are being unnecessarily
          replaced due to confusing syslog error messages.  
              

TOP FIN/FCO REPORT: No 
 
PRODUCT_REFERENCE:  Power Cooling Units  
 
PRODUCT CATEGORY:   Storage / Service 

PRODUCTS AFFECTED:  

Mkt_ID            Platform   Model   Description             Serial Number
------            --------   -----   -----------             -------------
Systems Affected
----------------
  -               ANYSYS     ALL     System Platform Independent    -

X-Options Affected
------------------
  -               T3         ALL     T3 StorEdge Array              -


PART NUMBERS AFFECTED: 

Part Number   Description                                        Model
-----------   -----------                                        -----
300-1453-01   325 Watt Power Supply w NiCd Battery (pre-release)   - 
300-1454-01   Tectrol TC64S-1327 325 Watt Power Supply             -


REFERENCES:
 
ESC:    529253

Manual: 806-1062-11 - Sun StorEdge T3 Disk Tray Installation, Operation 
                      and Service Manual. 
        806-1063-11 - Sun StorEdge T3 Disk Tray Administrator's Guide.

      
PROBLEM DESCRIPTION:

Customers and field personnel may misinterpret certain StorEdge T3
syslog messages which report T3 battery problems.  This can result in
Power and Cooling Units (PCU) being replaced when they are not bad.
This causes unnecessary expense and possible down time for a customer's
system.
 
Network Storage Engineering are working on new firmware which will
correct these confusing syslog messages, however this is work in
process.  Until the new FW is completed, this FIN is notification to
the field regarding syslogs messages which may be confusing and may be
misinterpreted.

Messages in the T3 syslog file are reported as one of four types:

   Error - Critical system event requiring immediate attention
   Warning - Possible event requiring eventual user intervention
   Notice - Event which may be a side effect of other events or 
            may be a normal condition
   Information - Event has no consequence on the health of the system

Some syslog messages can be misleading.  For example, attending to an
'N' (Notice) message may result in an unnecessary maintenance
action.  This FIN includes examples of normal 'N' Notice messages and
normal "W" Warning messages which can be ignored, and also examples of 
Warning messages which should be attended to.

There are two LEDs located on the exterior bezel panel on each PCU.
The leftmost LED is the AC LED and the rightmost LED is the PS status
LED.  Both LEDs are bi-color (green and amber).  The following refers
to the rightmost LED, the PS status LED.

    'Battery not OK' Notice with a flashing green PS LED
    indicates a normal recharge cycle; this is OK.

    'Refreshing battery' Notice with the amber PS LED on indicates
    a normal refresh cycle; this is OK.

The following is an example of a normal battery refresh cycle.

    Hold times of greater than 360 seconds indicate a good battery.
    However, the Notice "Battery not OK" might lead some
    customers to believe otherwise.  This message appears during
    all normal refresh cycles, and is not cause for alarm or action.
    See below.

Mar 20 08:00:56 BATD[1]: N: Battery Refreshing cycle starts from this point.
Mar 20 08:00:58 LPCT[1]: N: u1pcu1: Refreshing battery
Mar 20 08:01:01 LPCT[1]: N: u2pcu1: Refreshing battery
Mar 20 08:11:46 LPCT[1]: N: u1pcu1: Battery not OK  <---
Mar 20 08:11:49 LPCT[1]: N: u1pcu1: Battery not OK  <---
Mar 20 08:11:52 BATD[1]: N: u1pcu1: hold time was 655 seconds. <---
Mar 20 08:22:37 LPCT[1]: N: u2pcu1: Battery not OK  <---
Mar 20 08:22:40 BATD[1]: N: u2pcu1: hold time was 1301 seconds. <---
Mar 20 19:50:13 LPCT[1]: N: u1pcu2: Refreshing battery
Mar 20 19:50:16 LPCT[1]: N: u2pcu2: Refreshing battery
Mar 20 20:01:43 LPCT[1]: N: u1pcu2: Battery not OK  <---
Mar 20 20:01:46 BATD[1]: N: u1pcu2: hold time was 695 seconds.  <---
Mar 20 20:11:37 LPCT[1]: N: u2pcu2: Battery not OK  <---
Mar 20 20:11:40 LPCT[1]: N: u2pcu2: Battery not OK  <---
Mar 20 20:11:43 BATD[1]: N: u2pcu2: hold time was 1289 seconds. <---
Mar 21 07:37:37 BATD[1]: N: Battery Refreshing cycle ends at this point.

   In the above example hold times are greater than 360 seconds, meaning
   that the batteries are taking a long time before they are discharging
   to a minimum level.  This indicates healthy batteries.

   The Notice 'N: u1pcu1: Battery not OK' occurs during the refresh
   cycle.  The Notice indicates the end of the discharge, and a
   notification message stating 'N: Battery Refreshing cycle ends at
   this point' will be posted at the end of the refresh cycle.  This is
   a normal refresh sequence.

   Additional examples of syslog messages that can be misleading are
   appended to the bottom of this section.

        . The first example highlights misleading error messages that
          are a result of a date command (tzset) within a refresh cycle.
          This generates a false warning message.
          See Example A.

        . The second example highlights misleading error messages that
          are the result of a unit being switched off during a refresh
          cycle. This generates the "hold time low" warning in error.
          See Example B.
          

Syslog Examples:

Example A. This example highlights misleading error messages that
           are a result of a date command (tzset) within a refresh cycle.
           This generates a false warning message.


Sep 14 22:08:53 BATD[1]: N: Battery Refreshing cycle starts from this point.
Sep 14 22:08:56 LPCT[1]: N: u1pcu1: Refreshing battery
Sep 14 22:08:59 LPCT[1]: N: u2pcu1: Refreshing battery
Sep 14 22:20:20 LPCT[1]: N: u1pcu1: Battery not OK
Sep 14 22:20:22 BATD[1]: N: u1pcu1: hold time was 689 seconds.
Sep 14 22:22:59 LPCT[1]: N: u2pcu1: Battery not OK
Sep 14 22:23:02 BATD[1]: N: u2pcu1: hold time was 845 seconds.
Sep 15 01:51:26 sh13[1]: N: tzset -0800  <--------
Sep 15 01:51:27 sh13[1]: N: date 200009141758.38
Sep 14 17:58:38 BATD[1]: W: u1pcu1: Replace battery, hold time
low.<--------
Sep 14 17:58:38 BATD[1]: W: u2pcu1: Replace battery, hold time
low.<--------
Sep 14 17:58:38 BATD[1]: N: u1pcu1 Battery took too long to recharge.
Sep 14 17:58:38 BATD[1]: N: u2pcu1 Battery took too long to recharge.
Sep 14 17:58:38 BATD[1]: N: Battery Refreshing cycle ends at this point.


Example B.This example highlights misleading error messages that
          are the result of a unit being switched off during a refresh.
          This generates the "hold time low" warning in error.

Jan 18 17:42:27 BATD[2]: N: Battery Refreshing cycle starts from this point.
Jan 18 17:42:30 LPCT[2]: N: u1pcu1: Refreshing battery
Jan 18 17:42:33 LPCT[2]: N: u2pcu1: Refreshing battery
Jan 18 17:51:12 LPCT[2]: N: u1pcu1: Battery not OK
Jan 18 17:51:15 BATD[2]: N: u1pcu1: hold time was 527 seconds.
Jan 18 17:51:26 LPCT[2]: N: u2pcu1: Battery not OK
Jan 18 17:51:29 LPCT[2]: N: u2pcu1: Battery not OK
Jan 18 17:51:32 BATD[2]: N: u2pcu1: hold time was 542 seconds.
Jan 18 18:04:16 sh73[2]: N: fru stat
Jan 18 18:04:28 sh73[2]: N: id read u1pcu1
Jan 18 18:04:31 sh73[2]: N: id read u1pcu2
Jan 18 18:04:37 sh73[2]: N: id read u2pcu1
Jan 18 18:04:41 sh73[2]: N: id read u2pcu2
Jan 18 18:04:46 sh73[2]: N: refresh -s
Jan 18 18:04:48 sh73[2]: N: vol stat v0
Jan 18 18:04:49 sh73[2]: N: vol stat v1
Jan 19 04:47:32 LPCT[2]: W: u1pcu1: On battery
Jan 19 04:47:36 LPCT[2]: W: u1pcu1: Switch off  <--------
Jan 19 04:47:36 LPCT[2]: W: u1pcu1: Off
Jan 19 04:47:36 LPCT[2]: E: u1pcu1: Battery not present
Jan 19 04:47:39 LPCT[2]: W: u1pcu1: DC not OK
Jan 19 04:47:41 LPCT[2]: N: u1pcu1: Battery not OK
Jan 19 04:48:09 LPCT[2]: W: u2pcu1: On battery
Jan 19 04:47:41 LPCT[2]: N: u1pcu1: Battery not OK
Jan 19 04:48:09 LPCT[2]: W: u2pcu1: On battery
Jan 19 04:48:12 LPCT[2]: W: u2pcu1: Off
Jan 19 04:48:12 LPCT[2]: W: u2pcu1: Disabled
Jan 19 04:48:12 LPCT[2]: W: u2pcu1: Disabled
Jan 19 08:51:14 BATD[2]: W: u1pcu1: Replace battery, hold time
low.<--------
Jan 19 08:51:31 BATD[2]: W: u2pcu1: Replace battery, hold time
low.<--------
Jan 19 08:51:31 BATD[2]: N: u1pcu1 Battery took too long to recharge.
Jan 19 08:51:31 BATD[2]: N: u2pcu1 Battery took too long to recharge.
Jan 19 08:51:31 BATD[2]: N: u2pcu2:skips battery refresh because the other PCU
u2pcu1 : Power Supply not running
Jan 19 08:51:32 BATD[2]: N: Battery Refreshing cycle ends at this point.


IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---


CORRECTIVE ACTION:    

An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.

If your customer sees Warning (W) or Error (E) messages in the syslog
file indicating that the batteries and power supply should be replaced, 
do so as soon as possible.  Replacing the power supply when prompted to 
do so via error messages will protect the customer from experiencing an 
outage situation.  However, do not replace a battery (PCU) when the hold
time reported during a refresh cycle is greater than 360 seconds.  Also be 
aware that if the refresh cycle is interrupted by loss of power or by a 
tzset command, syslog may report false error messages.  In this case, do not
replace the battery.
   
In the example below, u1pcu1 should be replaced because the hold time
was less than 360 seconds.  
  
Mar 20 08:00:56 BATD[1]: N: u1pcu1: hold time was 20 seconds.
Mar 20 08:00:58 BATD[1]: W: u1pcu1: Replace battery, hold time low. <---
Mar 20 08:01:01 BATD[1]: N: u1pcu2: hold time was 695 seconds.
Mar 20 08:11:46 BATD[1]: N: u1pcu2: skips battery refresh because the
				PCU u1pcu1 : PCU1 hold time low

You may also use the 'id read' command from the T3 prompt to check 
battery status:
  
   newton62:/:<17>.id read u1pcu1
   Revision             : 0000
   Manufacture Week     : 00421999
   Battery Install Week : 00122001
   Battery Life Used    :   0 days, 15 hours
   Battery Life Span    : 730 days, 12 hours <-------
   Serial Number        : 005447
   Battery Warranty Date: 20010322172349
   Battery Internal Flag: 0x00000000 <------
   Vendor ID            : TECTROL-CAN
   Model ID             : 300-1454-01(50)
   
In the example above, 'Battery Life Span' is greater than 45 days, 
and the 'Battery Internal Flag' is not showing any 1's.  Assuming
no Warnings or Errors have been issued, this would indicate a
good battery/PCU.   

The next scheduled release of T3 firmware will change the wording of 
the syslog messages.  The changes will be a rewrite of the 
'Battery not OK' syslog messages.

			
COMMENTS:  

----------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        





Copyright (c) 1997-2003 Sun Microsystems, Inc.