Document fins/I0741-1


FIN #: I0741-1

SYNOPSIS: Replacement of a Disk on StorEdge A5200 may disconnect the Array

DATE: Nov/19/01

KEYWORDS: Replacement of a Disk on StorEdge A5200 may disconnect the Array


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Replacement of a Disk on StorEdge A5200 may disconnect the 
          Array.


SunAlert:           Yes 

TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  StorEdge A5200 Array  
 
PRODUCT CATEGORY:   Storage / SW Admin 


PRODUCTS AFFECTED:  
 
Systems Affected
----------------
Mkt_ID   Platform   Model   Description                 Serial Number
------   --------   -----   -----------                 -------------
  -       ANYSYS      -     System Platform Independent       -


X-Options Affected
------------------
Mkt_ID   Platform   Model   Description             Serial Number
------   --------   -----   -----------             -------------
  -       A5200      ALL    A5200 StorEdge Array          -


PART NUMBERS AFFECTED: 

Part Number     Description                             Model
-----------     -----------                             -----
501-4158-04     11-Slot FC-AL Disk Backplane              -


REFERENCES:

BugId:     4499964 - Removing disk in front slot 10 on A5200 causes losing 
                     the connection with array. 
           4509059 - A5200, change, disks, photon.
          
PatchId:   107469: SunOS 5.7: sf & socal drivers patch.

ESC:       531882 
           531991 
           532320           

Sun Alert: 40765

      
PROBLEM DESCRIPTION:

When replacing either disk, R10 or F10, in slot 10 of an StorEdge A5200 
array, a large amount of OFFLINE/ONLINE conditions occur and all disks 
of the A5200 become inaccessible.  When the disk is re-inserted the 
connectivity is back but time consuming maintenance procedures are 
necessary.

The failure (or actually loop degradation leading to a failure) occurs
only on drive 10 of both halves of the A5200 (i.e., r10 and f10).  This
happens when drive 10 is physically removed (after completion of the
luxadm remove_device command).  When that happens, the activity LEDs on
the drives being issued the I/Os is drastically reduced.

When a disk is powered off/offlined/bypassed but not removed from its
slot, its associated LRC (Loop Resiliency Circuit), which is located on
the backplane, remains active. This implies that the termination between
the disk and the LRC remains active.  When the disk is removed, the 
termination is also removed and the operation of the LRC is effected.
This effect is more drastic when drive 10 is removed because of an issue
that exists on specific pins of the LRC component that's used on the 
current revisions of the A5200 backplanes.  The problem shows up only on 
loop A, NOT on loop B.  

The following error messages will be seen in /var/adm/messages: 

        unix: sf74:     target 0x57 al_pa 0x4c offlined
        unix: sf74:     target 0x48 al_pa 0x67 offlined
        unix: ID[SUNWssa.socal.link.5010] socal37: port 0: Fibre 
              Channel is OFFLINE                  

Additional error messages using the "luxadm" display
<system_name> will be: 

                                   SENA
                               DISK STATUS
SLOT   FRONT DISKS       (Node WWN)          REAR DISKS        
(Node WWN)
0      On (No path found)2000002037f0dcbd    On (No pathfound)2000002037f0e3c0
1      On (No path found)2000002037f0dd03    On (No pathfound)2000002037f0cffa
2      On (No path found)2000002037f0dbc9    On (No pathfound)2000002037f0e3a9
3      On (No path found)2000002037f0db40    On (No pathfound)2000002037f0e31c
4      On (No path found)2000002037f0dcac    On (No pathfound)2000002037f0e522
5      On (No path found)2000002037f01f6a    On (No pathfound)2000002037f0e48b
6      On (No path found)2000002037f0db83    On (No pathfound)2000002037f0e330
7      On (No path found)2000002037f0db23    On (No pathfound)2000002037f0cfe5
8      On (No path found)2000002037f0db79    On (No pathfound)2000002037f0d638
9      On (No path found)2000002037f0db77    On (No pathfound)2000002037f0d53d
10     On (No path found)2000002037f0dbb7    On (No pathfound)2000002037f0d5a6

This is a hardware related problem with the photon backplane. It shows
up when the disk in slot 10 (r10 or f10) is removed, which leaves the
differential signal between the disk in slot 10 and the LRC (Loop 
Resiliency Circuit) chip un-terminated.  This will cause noise coupling on 
this signal which will in turn effect other signals of the Fibre Channel
loop. This issue exists only on pins 24 and 25 of the HP (HDMP0451) LRC
chip, which is due to their sensitivity to noise coupling when left 
un-terminated. This condition will cause the fibre channel loop to run 
in a degraded mode.  As a result, crc errors will occur and loop timeout 
will force frequent Loop Initialization Process (LIPs) to take place.  

NOTE: this degraded condition on the loop does NOT effect data integrity, 
      even if the loop was active at the time R10 or F10 was pulled.  
      The reason for this condition is that all I/O is acknowledge 
      between the host and the A5200.

NOTE: This only occurs when the array is in a single loop operation and 
      loop A is the loop in use.
      

IMPLEMENTATION:  
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        | X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---
         

CORRECTIVE ACTION:

The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives to avoid the above mentioned
problem.

Please adhere to the following step-by-step maintenance procedures
to replace either disk, R10 or F10, in slot 10 of an StorEdge A5200 
array. 
 
The following steps replace the previous luxadm remove_device, 
luxadm insert_device commands, and the physical removal and 
replacement of the disk. The drive spin-up operation, normally
executed by the luxadm insert_device command, has been replaced 
with FPM (Front Panel Module) operations as outlined in the 
following steps: 
 
----------------------------------------------------------------------------
NOTE: Have the replacement disk drive ready before starting this procedure.
----------------------------------------------------------------------------  

1. Use the  luxadm remove_device -F <box name>,<disk position> 
command to      
   remove the device entry for the disk drive being replaced. Specify the      
 
   correct box name and disk position for the drive. See luxadm(1M) man page
for 
   more information on luxadm commands. 
 
   NOTE: ALLOW FOR THE  luxadm remove_device  COMMAND TO COMPLETE BEFORE 
         EXECUTING STEP 2. For example: 
  
         # luxadm remove_device -F box2,r10 
  
   WARNING!!! Please ensure that no filesystems are mounted on these device(s).
   
              All data on these devices should have been backed up. 
  
   The list of devices which will be removed is:
    
     1: Box Name: "box2" rear slot 10 
        Node WWN: 2000002037e4a458 
        Device Type:Disk device 
        Device Paths: 
        /dev/rdsk/c11t122d0s2 
     
   Please verify the above list of devices and then enter  c  or <CR> 
   to Continue or  q  to Quit. [Default: c]: HIT <Return> HERE 
   stopping: Drive in "v a5200 400b fl" rear slot 10....Done 
   offlining: Drive in "v a5200 400b fl" rear slot 10....Done 
     
   Hit <Return> after removing the device(s). 
   *** HIT <Return> BUT DON T REMOVE THE DRIVE AT THIS POINT 
     
   Drive in Box Name "box2" rear slot 10. 
   Notice: Device has not been removed from the enclosure.
   It has been removed from the loop and is ready to be removed from 
   the enclosure, and the LED is blinking. 
     
   Logical Nodes being removed under /dev/dsk/ and /dev/rdsk: 
   Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
    
     c11t122d0s0 
     c11t122d0s1 
     c11t122d0s2 
     c11t122d0s3 
     c11t122d0s4 
     c11t122d0s5 
     c11t122d0s6 
     c11t122d0s7
 
2. Open the array door to access the disks. 
 
3. Remove and Replace the disk drive.
 
------------------------------------------------------------------------
NOTE: Follow the recommended Electrostatic Discharge (ESD) Precautions 
      when removing and replacing the disk drive. 
 -----------------------------------------------------------------------
 
   During the removal and replacement procedure, the disk should be 
   replaced as quickly as possible. This is to insure that the disk is 
   out of the slot a minimum amount of time. 
 
   Removal: Push down on the latch to release the bracket handle. 
   Use the handle to pull the disk drive out of the slot. 
 
   Replacement: Quickly insert the replacement disk in its place. 
   Slide the disk drive into the slot with the handle released. 
   Once you have inserted the disk drive as far as it will go into 
   the slot, push down on the handle to secure it. 
 
4. FPM operation for removal of the disk. 
 
   A.) From the FPM menu select "disks menu" 
   B.) Then select Front or Rear disk 
   C.) Then select the desired disk in slot "x" 
   D.) Then select "on" and "Continue" to spin up the disk. 
       Wait 30 seconds for the disk to spin up before continuing. 
       You will see a slight I/O interruption which shouldn't exceed 
       15 seconds, then the I/O to the other drives should start again. 
     
5. Use the luxadm insert_device command to install the replacement
   disk drive using luxadm insert_device <box name>,<disk
position>,            
   specifying the correct box name and disk position for the drive
   you have inserted. This step is executed only to add a new entry 
   for the new disk in the Device Tree.  See luxadm(1M) man page for
   more information on luxadm commands. For example: 
    
      # luxadm insert_device box2,r0 
    
      The list of devices which will be inserted is: 
      
         1: Box name "box2" rear slot 0 
         
         Please enter  q  to quit or, <Return> to continue: 
         
         Hit <Return> after inserting the device(s). 

6. Hit the Return key to complete the luxadm insert_command. 
         
7. For a disk array that is accessed by multiple hosts, repeat Steps 5 
   and 6 on the other hosts to install the device entries for the 
   new disk.
 
8. For arrays under a volume manager or other RAID manager control, 
   the new disk must be at least the same formatted capacity as the 
   disk it is replacing. Close the array door. 
   -------------------------------------------------------------------------
   NOTE: The doors are tight to insure an adequate seal. 
         To close, place your thumbs on either side of the latch and press 
         firmly. 
    -------------------------------------------------------------------------
    
9. For disks under volume manager control, notify volume manager of 
   all the new disks. 
     
     For example: # vxdctl enable 
     
10. Use the vxdiskadm command to bring the new disk into volume manager        

    control. 
 
       # vxdiskadm 
     
    ---------------------------------------------------------------------
    NOTE: To replace a disk drive, choose option 5. (replace a failed or       
      
    removed disk) to add a disk drive, choose option 1, "Add or initialize 
    one or more disks. 
    ---------------------------------------------------------------------
     
11. The volume can be restored if needed.


COMMENTS: 

None

----------------------------------------------------------------------------- 

Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.