Document fins/I0573-2
FIN #: I0573-2
SYNOPSIS: StorEdge A1000, A3000, A3500, A3500FC requires the existence of LUN 0
for proper operation.
DATE: Sep/28/00
KEYWORDS: StorEdge A1000, A3000, A3500, A3500FC requires the existence of LUN 0
for proper operation.
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: StorEdge A1000, A3000, A3500, A3500FC requires the existence
of LUN 0 for proper operation.
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: Storage Array A1000, A3X00, A3500FC
PRODUCT CATEGORY: Storage / SW Admin
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- A11 ALL Ultra Enterprise 1 -
- A12 ALL Ultra Enterprise 1E -
- A14 ALL Ultra Enterprise 2 -
- A25 ALL Ultra Enterprise 450 -
- E3000 ALL Ultra Enterprise 3000 -
- E3500 ALL Ultra Enterprise 3500 -
- E4000 ALL Ultra Enterprise 4000 -
- E4500 ALL Ultra Enterprise 4500 -
- E5000 ALL Ultra Enterprise 5000 -
- E5500 ALL Ultra Enterprise 5500 -
- E6000 ALL Ultra Enterprise 6000 -
- E6500 ALL Ultra Enterprise 6500 -
- E10000 ALL Ultra Enterprise 10000 -
X-Options Affected
------------------
- A1000 ALL StorEdge A1000 -
- A3000 ALL StorEdge A3000 -
- A3500 ALL StorEdge A3500 -
- A3500FC ALL StorEdge A3500FC -
6530A - - Sun RSM Array 63GB 15X4GB -
6531A - - Sun RSM Array 147GB 7X4GB -
6532A - - A3000 15*4.2GB/7200 FWSCSI -
6533A - - RSM2000 35*4.2GB/7200 FWSCSI -
6534A - - A3000 15*9.1GB/7200 FWSCSI -
6535A - - A3000 35*9.1GB/7200 FWSCSI -
SG-XARY122A-16G - - 16GB StorEdge A1000 -
SG-XARY122A-50G - - 50GB StorEdge A1000 -
SG-XARY124A-36G - - 36GB StorEdge A1000 -
SG-XARY124A-109G - - 109GB StorEdge A1000 -
SG-XARY126A-72G - - 72GB StorEdge A1000 -
SG-XARY126A-144G - - 144GB StorEdge A1000 -
SG-XARY135A-72G - - 72GB StorEdge A1000 For Rack -
SG-XARY131A-16G - - 16GB StorEdge A1000 For Rack -
SG-XARY133A-36G - - 36GB StorEdge A1000 For Rack -
SG-XARY351A-180G - - A3500 1 Cont Mod/5 Trays/18GB -
SG-XARY351A-360G - - A3500 1 Cont Mod/5 Trays/36GB -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
825-3869-02 MNL Set, SUN RSM ARRAY 2000 -
798-0188-01 SS, CD ASSY, RAID Manager 6.1 -
798-0522-01 CD ASSY, RAID Manager6.1.1 (2+) -
798-0522-02 CD ASSY, RAID Manager6.1.1 (2A) -
798-0522-03 CD ASSY, RAID Manager6.1.1 UPDATE 2 -
704-6708-10 CD, SUN STOREDGE RAID Manager6.22 -
REFERENCES:
BugId: 4313266
ESC: 524844
MANUAL: 805-7758-11 - Sun StorEdge RAID Manager 6.22 Release Notes for
A1000, A3x00, and A3500FC
805-7756-10 - Installation and Support Guide for Solaris
806-0478-10 - Sun StorEdge RAID Manager 6.22 User's Guide
806-3721-10 - Sun StorEdge RAID Manager 6.22 Release Notes
Addendum
PROBLEM DESCRIPTION:
This FIN has critical impact to all A3X00/A3500FC/A1000 configurations on
all Sun Ultra Enterprise platforms using all versions of Solaris/SunOS.
Ultra Systems experiencing faults documented in this FIN can be down
for extended periods or until LUN 0 is installed.
The rmlog.log and messages should be checked for errors as there
have been numerous instances of hosts being shutdown while resolution
daemon has been recovering failed I/O's. In this case, the indications
are that under heavy I/O, recovery of a failed block may not happen for
an hour and 20 minutes. Customer will likely have rebooted the host
before then, starting the problem over again.
The customer reboots the host because the customer might think that
the resolution daemon is in a hung state. The customer might hope that
after the reboot the daemon will reinitiate and complete the recovery
process. Unfortunately, a host reboot is no substitute for the lack of
an optimal LUN 0. After the reboot and if there is heavy I/O,
the recovery time will takes much longer and the customer
will be likely to reboot the host again hoping this will fix the
problem, but the symptom indicates that the problem is not fixed.
The problems associated with the deletion of LUN 0 include the
inability (or substantial delays) for the resolution daemon to perform
properly during I/O failures. A test conducted during a customer
escalation verified that the addition of an optimal LUN 0 allowed the
controllers, resolution daemon and associated components providing for
fail-over capability to function properly.
LSI/Symbios have confirmed that the removal of LUN 0 is not a valid or
supported configuration. While RM6 does allow a user to do this, the
removal of LUN 0 will cause unpredictable behavior, including
incorrect communication problems (through both GUI and CLI) with the
array and data loss due to random LUN failures.
The GUI command to delete a LUN can be applied to LUN 0. The CLI
command raidutil can delete LUN 0 either with "raidutil -D all" or
"raidutil -D 0". At this point the system is vulnerable to losing
communication with the host if either a SCSI bus reset is generated for
any reason or a LIP is generated when using a fibre-channel connection.
Users often want to resize LUN 0 since the factory default is only 10MB.
This involves deleting it and recreating it which opens up a "no LUN 0"
problem window.
The Release Notes Addendum for 6.22 are incorrect as they state that
the LUN 0 can be deleted after new LUNs are created and are optimal.
This even has been clearly documented in Bug Id: 4296354. Also
documented in this bug are the rules broken for the SCSI 2 & 3
specifications which do not allow for the absence of LUN 0.
Update for FIN I0573-2;
-----------------------
In this -2, the following has been updated to FINI0573-1;
1) The sixth paragraph has been added to the PROBLEM DESCRIPTION
as shown below.
The GUI command to delete a LUN can be applied to LUN 0. The CLI
command raidutil can delete LUN 0 either with "raidutil -D all" or
"raidutil -D 0". At this point the system is vulnerable to losing
communication with the host if either a SCSI bus reset is generated
for any reason or when using a fibre-channel connection, a LIP.
Users often want to resize LUN 0 since the factory default is only
10MB. This involves deleting it and recreating it which opens up a
"no LUN 0" problem window.
2) The 4th through 9th paragraphs have been added to the CORRECTIVE
ACTION section which describes the commands to be avoided, how to
remake LUN 0, and how to recover and reset the entire array.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problem by following the recommendations
as shown below;
If a host exhibits delays or an inability to recover from I/O faults or
re-balance LUN's, look for the presence (or absence) of an optimal
LUN 0.
On systems having no LUN 0, run RM6 to add an optimal LUN 0 to the
configuration. On systems without disk space available, consultation
will be required with the customer to architect a workaround to allow
for the addition of LUN 0 on a time and materials basis.
The problem can be avoided by not deleting LUN 0. LUN 0 comes from the
factory on all arrays as a 10MB RAID 0 device which is not a useful
size. Historically LUN 0 had to be resized to be used, but that is
only accomplished by deleting it and recreating it. However, all
Solaris drivers support multiple LUNs per array so LUN 0 can be left
alone.
One can delete and recreate any LUN including zero in a single command
line: "raidutil -D 0 -n 0" but this is not really an atomic operation
as there is one internal operation to delete LUN 0 and then another to
recreate it.
In order to remake LUN 0, make sure that another optimal LUN exists on
the controller, A or B, which owns LUN 0. The default LUN 0 will be on
controller A, unless its been explicitly moved. "lad" or "rdacutil
-i
cXtXd0" will show you where the controllers are in the system and which
controller owns the LUNs for an array. Its much safer to do the LUN 0
deletion and recreation when there is no other activity on the array,
and its SCSI bus or FC loop.
If the entire array is to be re-organized then use the GUI Reset
Configuration command under Configuration->File->Reset Configuration.
It leaves a default LUN 0 on controller A. Make sure you always use the
path to a controller with at least one LUN on it when using the CLI
version "raidutil -c path -X", see bug 4281850. "raidutil -D
all"
should never be used. In extreme cases, there is a serial port command
(Syswipe) that will completely clean up the configuration, losing all
data.
If the array should get into a state where there is no LUN 0, then
powering the array off and back on will cause it to go through Start of
Day (SOD) processing which creates a default LUN 0. In this case, only
the controller modules need to be power cycled, not all the trays. A
host reboot will not accomplish the same thing, unless one is running
RM 6.1.1 or earlier.
The Release Notes Addendum to RM 6.22, 806-3721-10, describes safe
procedures for creating a new LUN 0. It also contains a procedure for
restoring communication to a FC array in the rare case communication is
lost during the above operations. The addendum is available internally
at http://thedance.ebay.sun.com/software/manage/raidmgr/rm_6.22.html.
While this FIN initially targets RM6 managed arrays A1000/A3x00/
A3500FC, there is a high probability that this problem effects other or
all of Sun's LUN based storage arrays. Sun is testing its other LUN
based arrays to determine if any violate the SCSI 2 or 3 specification
rule which requires the presence of an optimal LUN 0 even after
additional LUNs are created.
COMMENTS:
--------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.