Document fins/I0691-1


FIN #: I0691-1

SYNOPSIS: New disk drive initialization problem if replaced failed disk drive

DATE: Jun/27/01

KEYWORDS: New disk drive initialization problem if replaced failed disk drive


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: When a failed disk drive is replaced with a new drive in a
          StorEdge T3 Array, the Loop 2 (APATH or Path 1) path on the 
          new drive fails to properly initialize.


Sun Alert:          No

TOP FIN/FCO REPORT: No  
 
PRODUCT_REFERENCE:  Disk Drives on StorEdge T3 Array     
 
PRODUCT CATEGORY:   Storage / Service   


PRODUCTS AFFECTED:  
 
Mkt_ID   Platform   Model   Description                 Serial Number
------   --------   -----   -----------                 -------------
Systems Affected
----------------
   -     Anysys      ALL    System Platform Independent       -

X-Options Affected
------------------
   -       T3        ALL    StorEdge T3 Array                 -
X6713A      -         -     FC-AL 18.2GB 10KRPM 1" Disk       -    
X6714A      -         -     FC-AL 36.4GB 10KRPM 1.6" Disk     -
X6716A      -         -     FC-AL 18.2GB 10KRPM 1.6" Disk     -
X6717A      -         -     FC-AL 72.8GB 10KRPM 1.6" Disk     -


PART NUMBERS AFFECTED: 

Part Number   Description   Model
-----------   -----------   -----
540-4440-01   18GB Assembly/FRU          -     
540-4367-01   36GB Assembly/FRU          -
540-4519-01   73GB Assembly/FRU          -
390-0053-01   Seagate ST318304FC 18GB    -     Disk   Seagate    A726
390-0056-01   Seagate ST336704FC 36GB    -     Disk   Seagate    A726
390-0036-01   Seagate ST173404FC 73GB    -     Disk   Seagate    A727


REFERENCES:

BugId: 4407776 - T3 not properly initialize disk paths after disk hot 
                 swap.

URL:   http://hes.west/nws/products/T3/tools/t3path_chk

      
PROBLEM DESCRIPTION: 

Sun StorEdge T3 arrays contain dual-ported FC-AL disk drives. The drive
ports are connected to the T3 back-end loops via the loop cards (Loop 1
and Loop 2) and are capable of receiving I/O from either path (Path 0
or Path 1). These paths are displayed by T3 CLI commands and monitoring
software using different conventions.  The paths are mapped as follows:
Loop 1 = Path 0 = PPATH and Loop 2 = Path 1 = APATH.

A single drive failure and subsequent disk replacement that encounters
bug 4407776 does not have a major impact on the operation of the Sun
StorEdge T3 array.  Functionally, the new disk works as a drive with a
failed APATH, and will process all I/O via its PPATH connection.  If
the remaining port fails or there are problems that cause Loop 1 to
fail (prior to a reset), the disk will be disabled and it will appear
as a failed drive. 

If two failed disk drives are replaced within the same LUN over time,
and subsequently a path failure occurs on Loop 1 (PPATH or Path 0), the
only remaining path to those two drives, the LUN will unmount and the
data on that LUN will be unavailable. The LUN will be off-line until
the path problem is fixed and the T3 is manually reset. It may also be
necessary to remove and recreate the LUN or perform other complex
recovery actions to ensure data integrity.

In the previous scenario, if the cache mode on the T3 is in writebehind
when the path failure occurs, there is the possibility that write data
staged in cache will not be written to disk before the LUN is taken
off-line.  This will result in data loss.
    
There are no obvious symptoms that indicate this problem has been
encountered.  Monitoring software will show no faults on the array, and
most diagnostic commands will report that the array is healthy.
All Sun StorEdge T3 Array configurations are affected by this bug.

The problem can be discovered by examining the output of the following 
command executed from the T3 CLI:

     T3:/:<1>.disk pathstat u[1|2]d1-9

This command executed on a T3 partner group which exhibited the problem 
shows the following: 

     T3:/:<1>.disk pathstat u1d1-9

       DISK PPATH  APATH  CPATH  PATH_POLICY  FAIL_POLICY
       --------------------------------------------------
       u1d1 [0 U]  [1 U]  APATH  APATH        PATH
       u1d2 [0 U]  [1 U]  APATH  APATH        PATH
       u1d3 [0 U]  [1 U]  APATH  APATH        PATH
       u1d4 [0 U]  [1 U]  PPATH  PPATH        PATH
       u1d5 [0 U]  [1 U]  PPATH  PPATH        PATH
       u1d6 [0 U]  [1 U]  PPATH  PPATH        PATH
       u1d7 [0 U]  [1 U]  PPATH  PPATH        PATH
       u1d8 [0 U]  [1 U]  PPATH  PPATH        PATH
       u1d9 [0 U]  [1 U]  PPATH  PPATH        PATH

     pass

     T3:/:<1>.disk pathstat u2d1-9

       DISK PPATH  APATH   CPATH   PATH_POLICY  FAIL_POLICY
       ----------------------------------------------------
       u2d1 [0 U]  [1 U]   APATH   APATH        PATH
       u2d2 [0 U]  [1 U]   APATH   APATH        PATH
       u2d3 [0 U]  [1 U]   APATH   APATH        PATH
       u2d4 [0 U]  [1 U]   PPATH   PPATH        PATH
       u2d5 [0 U]  [1 U]   PPATH   PPATH        PATH
       u2d6 [0 U]  [1 U]   PPATH   PPATH        PATH
       u2d7 [0 U]  [1 U]   PPATH   PPATH        PATH
       u2d8 [0 U]  [-1 U]  PPATH   PPATH        PATH
       u2d9 [0 U]  [1 U]   PPATH   PPATH        PATH

     pass

Note the '-1' in the output for the APATH column of disk u2d8. If any
disk shows a '-1' in the ".disk pathstat" output, it can be assumed
that disk has encountered this bug. Any path exhibiting this behavior
is unavailable as a failover path for the affected drive.

A tool called "t3path_chk" has been developed to aid in the
identification of this problem.  It can be executed via a CRON job
automatically on multiple T3 arrays. To obtain the tool and
instructions for its use, see:

     http://hes.west/nws/products/T3/tools/t3path_chk
 
Root cause: The Sun StorEdge T3 Array controller firmware does not
allow both paths to a new disk to be initialized with the new disk WWN
(world wide number) following a disk hot swap. During a controller boot
cycle, the T3 firmware initializes both paths to all existing disks by
their WWN.  The firmware functions that handle disk initialization do
not operate properly on a hot swapped disk with a different WWN than
the one found at boot time.  As a result, the new disk will not have
its APATH initialized.

The fix for this bug will be included in version 1.18 of the Sun
StorEdge T3 Array controller firmware scheduled to release in August,
2001.
  

IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        |   |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION: 

An Authorized Enterprise Field Service Representative may avoid the
above mentioned problems by following the recommendations as shown
below.

Following a disk drive replacement, the system should be allowed to
complete its reconstruction to the new drive and return to a fully
redundant FRU state.  During the next maintenance window, the T3 
should be reset to reinitialize the disk paths to the new drive.


COMMENTS:  

----------------------------------------------------------------------------

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        


Copyright (c) 1997-2003 Sun Microsystems, Inc.