Document fins/I0741-1
FIN #: I0741-1
SYNOPSIS: Replacement of a Disk on StorEdge A5200 may disconnect the Array
DATE: Nov/19/01
KEYWORDS: Replacement of a Disk on StorEdge A5200 may disconnect the Array
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Replacement of a Disk on StorEdge A5200 may disconnect the
Array.
SunAlert: Yes
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: StorEdge A5200 Array
PRODUCT CATEGORY: Storage / SW Admin
PRODUCTS AFFECTED:
Systems Affected
----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- ANYSYS - System Platform Independent -
X-Options Affected
------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- A5200 ALL A5200 StorEdge Array -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
501-4158-04 11-Slot FC-AL Disk Backplane -
REFERENCES:
BugId: 4499964 - Removing disk in front slot 10 on A5200 causes losing
the connection with array.
4509059 - A5200, change, disks, photon.
PatchId: 107469: SunOS 5.7: sf & socal drivers patch.
ESC: 531882
531991
532320
Sun Alert: 40765
PROBLEM DESCRIPTION:
When replacing either disk, R10 or F10, in slot 10 of an StorEdge A5200
array, a large amount of OFFLINE/ONLINE conditions occur and all disks
of the A5200 become inaccessible. When the disk is re-inserted the
connectivity is back but time consuming maintenance procedures are
necessary.
The failure (or actually loop degradation leading to a failure) occurs
only on drive 10 of both halves of the A5200 (i.e., r10 and f10). This
happens when drive 10 is physically removed (after completion of the
luxadm remove_device command). When that happens, the activity LEDs on
the drives being issued the I/Os is drastically reduced.
When a disk is powered off/offlined/bypassed but not removed from its
slot, its associated LRC (Loop Resiliency Circuit), which is located on
the backplane, remains active. This implies that the termination between
the disk and the LRC remains active. When the disk is removed, the
termination is also removed and the operation of the LRC is effected.
This effect is more drastic when drive 10 is removed because of an issue
that exists on specific pins of the LRC component that's used on the
current revisions of the A5200 backplanes. The problem shows up only on
loop A, NOT on loop B.
The following error messages will be seen in /var/adm/messages:
unix: sf74: target 0x57 al_pa 0x4c offlined
unix: sf74: target 0x48 al_pa 0x67 offlined
unix: ID[SUNWssa.socal.link.5010] socal37: port 0: Fibre
Channel is OFFLINE
Additional error messages using the "luxadm" display
<system_name> will be:
SENA
DISK STATUS
SLOT FRONT DISKS (Node WWN) REAR DISKS
(Node WWN)
0 On (No path found)2000002037f0dcbd On (No pathfound)2000002037f0e3c0
1 On (No path found)2000002037f0dd03 On (No pathfound)2000002037f0cffa
2 On (No path found)2000002037f0dbc9 On (No pathfound)2000002037f0e3a9
3 On (No path found)2000002037f0db40 On (No pathfound)2000002037f0e31c
4 On (No path found)2000002037f0dcac On (No pathfound)2000002037f0e522
5 On (No path found)2000002037f01f6a On (No pathfound)2000002037f0e48b
6 On (No path found)2000002037f0db83 On (No pathfound)2000002037f0e330
7 On (No path found)2000002037f0db23 On (No pathfound)2000002037f0cfe5
8 On (No path found)2000002037f0db79 On (No pathfound)2000002037f0d638
9 On (No path found)2000002037f0db77 On (No pathfound)2000002037f0d53d
10 On (No path found)2000002037f0dbb7 On (No pathfound)2000002037f0d5a6
This is a hardware related problem with the photon backplane. It shows
up when the disk in slot 10 (r10 or f10) is removed, which leaves the
differential signal between the disk in slot 10 and the LRC (Loop
Resiliency Circuit) chip un-terminated. This will cause noise coupling on
this signal which will in turn effect other signals of the Fibre Channel
loop. This issue exists only on pins 24 and 25 of the HP (HDMP0451) LRC
chip, which is due to their sensitivity to noise coupling when left
un-terminated. This condition will cause the fibre channel loop to run
in a degraded mode. As a result, crc errors will occur and loop timeout
will force frequent Loop Initialization Process (LIPs) to take place.
NOTE: this degraded condition on the loop does NOT effect data integrity,
even if the loop was active at the time R10 or F10 was pulled.
The reason for this condition is that all I/O is acknowledge
between the host and the A5200.
NOTE: This only occurs when the array is in a single loop operation and
loop A is the loop in use.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives to avoid the above mentioned
problem.
Please adhere to the following step-by-step maintenance procedures
to replace either disk, R10 or F10, in slot 10 of an StorEdge A5200
array.
The following steps replace the previous luxadm remove_device,
luxadm insert_device commands, and the physical removal and
replacement of the disk. The drive spin-up operation, normally
executed by the luxadm insert_device command, has been replaced
with FPM (Front Panel Module) operations as outlined in the
following steps:
----------------------------------------------------------------------------
NOTE: Have the replacement disk drive ready before starting this procedure.
----------------------------------------------------------------------------
1. Use the luxadm remove_device -F <box name>,<disk position>
command to
remove the device entry for the disk drive being replaced. Specify the
correct box name and disk position for the drive. See luxadm(1M) man page
for
more information on luxadm commands.
NOTE: ALLOW FOR THE luxadm remove_device COMMAND TO COMPLETE BEFORE
EXECUTING STEP 2. For example:
# luxadm remove_device -F box2,r10
WARNING!!! Please ensure that no filesystems are mounted on these device(s).
All data on these devices should have been backed up.
The list of devices which will be removed is:
1: Box Name: "box2" rear slot 10
Node WWN: 2000002037e4a458
Device Type:Disk device
Device Paths:
/dev/rdsk/c11t122d0s2
Please verify the above list of devices and then enter c or <CR>
to Continue or q to Quit. [Default: c]: HIT <Return> HERE
stopping: Drive in "v a5200 400b fl" rear slot 10....Done
offlining: Drive in "v a5200 400b fl" rear slot 10....Done
Hit <Return> after removing the device(s).
*** HIT <Return> BUT DON T REMOVE THE DRIVE AT THIS POINT
Drive in Box Name "box2" rear slot 10.
Notice: Device has not been removed from the enclosure.
It has been removed from the loop and is ready to be removed from
the enclosure, and the LED is blinking.
Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
c11t122d0s0
c11t122d0s1
c11t122d0s2
c11t122d0s3
c11t122d0s4
c11t122d0s5
c11t122d0s6
c11t122d0s7
2. Open the array door to access the disks.
3. Remove and Replace the disk drive.
------------------------------------------------------------------------
NOTE: Follow the recommended Electrostatic Discharge (ESD) Precautions
when removing and replacing the disk drive.
-----------------------------------------------------------------------
During the removal and replacement procedure, the disk should be
replaced as quickly as possible. This is to insure that the disk is
out of the slot a minimum amount of time.
Removal: Push down on the latch to release the bracket handle.
Use the handle to pull the disk drive out of the slot.
Replacement: Quickly insert the replacement disk in its place.
Slide the disk drive into the slot with the handle released.
Once you have inserted the disk drive as far as it will go into
the slot, push down on the handle to secure it.
4. FPM operation for removal of the disk.
A.) From the FPM menu select "disks menu"
B.) Then select Front or Rear disk
C.) Then select the desired disk in slot "x"
D.) Then select "on" and "Continue" to spin up the disk.
Wait 30 seconds for the disk to spin up before continuing.
You will see a slight I/O interruption which shouldn't exceed
15 seconds, then the I/O to the other drives should start again.
5. Use the luxadm insert_device command to install the replacement
disk drive using luxadm insert_device <box name>,<disk
position>,
specifying the correct box name and disk position for the drive
you have inserted. This step is executed only to add a new entry
for the new disk in the Device Tree. See luxadm(1M) man page for
more information on luxadm commands. For example:
# luxadm insert_device box2,r0
The list of devices which will be inserted is:
1: Box name "box2" rear slot 0
Please enter q to quit or, <Return> to continue:
Hit <Return> after inserting the device(s).
6. Hit the Return key to complete the luxadm insert_command.
7. For a disk array that is accessed by multiple hosts, repeat Steps 5
and 6 on the other hosts to install the device entries for the
new disk.
8. For arrays under a volume manager or other RAID manager control,
the new disk must be at least the same formatted capacity as the
disk it is replacing. Close the array door.
-------------------------------------------------------------------------
NOTE: The doors are tight to insure an adequate seal.
To close, place your thumbs on either side of the latch and press
firmly.
-------------------------------------------------------------------------
9. For disks under volume manager control, notify volume manager of
all the new disks.
For example: # vxdctl enable
10. Use the vxdiskadm command to bring the new disk into volume manager
control.
# vxdiskadm
---------------------------------------------------------------------
NOTE: To replace a disk drive, choose option 5. (replace a failed or
removed disk) to add a disk drive, choose option 1, "Add or initialize
one or more disks.
---------------------------------------------------------------------
11. The volume can be restored if needed.
COMMENTS:
None
-----------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.