Document fins/I0619-1
FIN #: I0619-1
SYNOPSIS: Proper procedures for booting from StorEdge A1000 or A3x00 hardware
RAID device, including known issues and problems
DATE: May/22/00
KEYWORDS: Proper procedures for booting from StorEdge A1000 or A3x00 hardware
RAID device, including known issues and problems
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Proper procedures for booting from StorEdge A1000 or A3x00
hardware RAID device, including known issues and problems.
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: StorEdge A1000 and A3X00 Arrays
PRODUCT CATEGORY: Storage / SW Admin
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- ANYSYS - System Platform Independent -
X-Options Affected
------------------
SG-XARY122A-16G - - 16GB STOREDGE A1000 -
SG-XARY122A-50G - - 50GB STOREDGE A1000 -
SG-XARY124A-109G - - 109GB STOREDGE A1000 -
SG-XARY124A-36G - - 36GB STOREDGE A1000 -
SG-XARY126A-144G - - 144GB STOREDGE A1000 -
SG-XARY126A-72G - - 72GB STOREDGE A1000 -
SG-XARY131A-16G - - 16GB STOREDGE A1000 FOR RACK -
SG-XARY133A-36G - - 36GB STOREDGE A1000 FOR RACK -
SG-XARY135A-72G - - 72GB STOREDGE A1000 FOR RACK -
SG-XARY351A-180G - - A3500 1 CONT MOD/5 TRAYS/18GB -
SG-XARY353A-1008G - - A3500 2 CONT/7 TRAYS/18GB -
SG-XARY353A-360G - - A3500 2 CONT/7 TRAYS/18GB -
SG-XARY355A-2160G - - A3500 3 CONT/15 TRAYS/18GB -
SG-XARY360A-545G - - 545-GB A3500 (1X5X9-GB) -
SG-XARY360A-90G - - A3500 1 CONT/5 TRAYS/9GB(10K) -
SG-XARY362A-180G - - A3500 2 CONT/7 TRAYS/9GB(10K) -
SG-XARY362A-763G - - A3500 2 CONT/7 TRAYS/9GB(10K) -
SG-XARY364A-1635G - - A3500 3 CONT/15 TRAYS/9GB(10K) -
SG-XARY366A-72G - - A3500 1 CONT/2 TRAYS/9GB(10K) -
SG-XARY380A-1092G - - 1092-GB A3500 (1x5x18-GB) -
SG-XARY360B-90G - - ASSY,TOP OPT,1X5X9,MIN,9GB,10K -
SG-XARY360B-545G - - ASSY,TOP OPT,1X5X9,MAX,9GB,10K -
SG-XARY362B-180G - - X-OPT,2X7X9,MIN,FCAL,9G10K -
SG-XARY374B-273G - - ASSY,TOP OPT,3X15X9,MIN,9GB,10K -
SG-XARY380B-182G - - X-OPT,FC-SN,1X5X18MIN,18GB10K -
SG-XARY380B-1092G - - ASSY,FC-SNL,1X5X18MAX,18G10K -
SG-XARY382B-364G - - ASSY,FC-SN,2X7X18,MIN,18GB,10K -
SG-XARY384B-546G - - ASSY,FC,3X15X18,MIN,18GB -
SG-XARY381B-364G - - ASSY,FC-SN,1X5X36MIN,36G10K -
SG-XARY381B-1456G - - ASSY,FC-SN,1X5X36MAX,36B10K -
SG-XARY383B-728G - - ASSY,FC-SN,2X7X36MIN,36B10K -
SG-XARY385B-1092G - - ASSY,FC-SN,3X15X36MIN,36B10K -
UG-A3500-FC-545G - - ASSY,TOP OPT,1X5X9,MAX,9GB,10K -
CU-A3500-FC-545G - - ASSY,TOP OPT,1X5X9,MAX,9GB,10K -
UG-A3500FC-182-10K - - FCTY,A3500FC/SCSI,1X5X18MIN,18/10K -
CU-A3500FC-182-10K - - FCTY,A3500FC/SCSI,1X5X18MIN,18/10K -
UG-A3500FC-364-10K - - FCTY,A3500FC/SCSI,2X7X18MIN,18/10K -
CU-A3500FC-364-10K - - FCTY,A3500FC/SCSI,2X7X18MIN,18/10K -
UG-A3500FC-546-10K - - FCTY,A3500FC/SCSI,3X15X18MIN 18G10K -
CU-A3500FC-546-10K - - FCTY,A3500FC/SCSI,3X15X18MIN 18G10K -
UG-A3K-A3500FC - - ASSY,UPGRADE,A3500FC/TABASCO -
UG-A3500-A3500FC - - ASSY,UPGRADE,A3500FC/DILBERT -
X6538A - - X-OPT,A3500FC CONTROLLER -
6538A - - FCTY, CONTROLLER, A3500FC -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
798-0522-01 RAID Manager 6.1.1 -
798-0522-02 RAID Manager6.1.1 Update 1 -
798-0522-03 RAID Manager6.1.1 Update 2 -
704-6708-10 CD, SUN STOREDGE RAID Manager6.22 -
704-7937-05 CD, SUN STOREDGE RAID Manager6.22.1 -
REFERENCES:
BugId: 4235026 - probe-scsi-all doesn't see LUNs at OBP level.
4240583 - probe-fcal-all does not "see" any FC A3x00 luns.
4291868 - Doing a reconfig reboot off of FC A3x00 will cause
you to lose boot disk.
4233846 - Limited bootability for SCSI on RM 6.22 is not
working as documented .
4354225 - RM6.22 patches 108834 and 108553 causes
inability to boot from A3x00 LUN 0.
4234427 - Cannot Boot A3500FC devices due to drivers not in
OS releases.
4328575 - A1000 as a boot device for Ex000: need support matrix.
4191694 - PCI E450 reports - Fatal SCSI error when booting off
of RAID device RM 6.1.
4338808 - When booting from an A1000 the A3500FC luns do not
show up.
4166678 - Initial boot from A1000 (Dilbert) connected to US2D
PCI card fails.
1251360 - obp: 875 boot code does not respond to target initiated
WDTR.
4382104 - Can not force S2.6 kernel core dump if OS is on A3x00's
Lun 0.
4388578 - firmware 03010300.bwd/03010354.apd and later breaks
bootability.
4289429 - Sonoma results in bad dump device during dump.
4472109 - A1000 running 03xxxxxx firmware/ rm6.22 will not boot
from E450.
4486082 - RM6.22x installation on A3x00 boot device fails on
Solaris 8, Update 4 and 5.
PatchId: 108553 for Solaris 8
108834 for Solaris 2.6, and 7
112125 RM 6.22.1 for Solaris 2.6 and 7
112126 RM 6.22.1 for Solaris 8
FIN: I0551-1
ESC: 526130 - When booting from an A1000 the A3500FC luns do not show
up.
520025 - Initial boot from A1000 (Dilbert) connected to US2D PCI
card fails.
525911 - probe-fcal-all does not see any FC A3500 luns.
526549 - System hangs frequently with A3500 as boot-device.
528158 - boot off A1000 under S2.6 no dump/core device is
available.
DOC: Early Notifier 20029.
Manual: 805-7758-12: Sun StorEdge RAID Manager 6.22.1 Release Notes.
806-6419-12: Sun StorEdge A3x00/A3500FC Best Practices Guide.
806-7792-14: Sun StorEdge RAID Manager 6.22 and 6.22.1 Upgrade
Guide.
PROBLEM DESCRIPTION:
Customers installing the Solaris Operating Environment to their
A1000, or A3x00 hardware RAID device may encounter numerous problems
ranging from harmless error messages to an inability to access their
boot device or mount their /(root) filesystem.
Approximately 10,538 A3x00 and A3500FC units in 1x5, 2x7, and 3x15
configurations have been shipped since January of 1998. And
approximately 31,600 A1000 units have been shipped since January of
1998. The number of units that may be used as boot devices is unknown.
RM 6.22.1 is the recommended and supported version of RAID Manager.
Booting from an A1000, or A3x00 hardware RAID device can be problematic
and may result in different types of errors. Below are some common
error messages that will appear if these procedures are not closely
adhered to. Booting from an A3500FC is not supported.
Booting from an A1000 or A3x00 array with RM6.22 installed requires patches
108553 for Solaris 8 or 108834 for Solaris 2.6 and 7. See bug numbers
4354225 and 4289429. One of the following errors may occur if the incorrect
patch levels are used:
SunOS Release 5.6 Version Generic_105181-21 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.
Cannot assemble drivers for root /pseudo/rdnexus@0/rdriver@5,0:a
Cannot mount root on /pseudo/rdnexus@0/rdriver@5,0:a fstype ufs
panic[cpu0]/thread=0x10404000: vfs_mountroot: cannot mount root
rebooting...
Resetting...
Or:
SunOS Release 5.8 Version Generic_108528-01 64-bit
Copyright 1983-2000 Sun Microsystems, Inc. All rights reserved.
Cannot assemble drivers for root /pseudo/rdnexus@0/rdriver@5,0:a
Cannot mount root on /pseudo/rdnexus@0/rdriver@5,0:a fstype ufs
panic[cpu0]/thread=10408000: vfs_mountroot: cannot mount root
0000000010407970 genunix:vfs_mountroot+70 (10431000, 0, 0, 10410800, 10, 14)
%l0-3: 0000000010431000 0000000010434708 000000003fc00000 0000000010431448
%l4-7: 0000000000000000 0000000010413468 00000000000b6322 0000000000000322
0000000010407a20 genunix:main+94 (10410048, 2000, 10407ec0, 10408030, fff2,
100509ac)
%l0-3: 0000000000000001 0000000000000001 0000000000000015 0000000000000ea1
%l4-7: 0000000010424de0 000000001045cab8 00000000000c9610 0000000000000540
skipping system dump - no dump device configured
rebooting...
Resetting...
When booting from a hardware RAID device attached to a PCI based host,
the boot will fail with the error "trap 3e". Refer to bugIds 4166678
and 1251360. The workaround is to simply issue the boot command again.
The error will not occur the second time.
When booting from a hardware RAID device attached to an Enterprise 450
host, the following message may appear several times:
Fatal SCSI error at script address 258 Unexpected disconnect
Drive not ready
This message is harmless and will not cause any problems with the
configuration.
Due to bugID 4472109, booting from an A3x00 or A1000 with 3.x
controller firmware attached to the dual differential PCI SCSI host bus
adaptor will not work. Work-arounds include using RM6.1.1 with firmware
2.x or using an SBus host adaptor. Refer to the bug report for more
details.
Placing your boot device under Veritas VM or Solstice DiskSuite control
may corrupt the Solaris Operating Environment and require it to be
re-installed.
Deleting LUN 0 or resetting the configuration of the module containing
the boot device will destroy the boot device.
When booting from an array that has a single controller (A1000), or is
in the independent controller configuration, there is no RDAC failover
protection, or a controller firmware upgrade can only be performed by
booting from an alternate boot device such as an independent disk
drive.
A Sonoma LUN can now be used as a dump device with Solaris 2.6. Bug
number 4289429 was fixed with patches 105356 (Solaris 2.6) and
107458 (Solaris 7). The fix is included in base Solaris 8 as well.
Refer to the following bug reports for the detailed root cause of the
problem.
4388578(sonoma) firmware 03010300.bwd/03010354.apd and later breaks
bootability
4289429(sonoma) Sonoma results in bad dump device during dump
4235026(sonoma) probe-scsi-all doesn't see LUNs at OBP level.
4240583(fusion) probe-fcal-all does not "see" any FC A3x00 LUNs.
4291868(kernel) Doing a reconfig reboot off of FC A3x00 will cause you
to lose boot disk.
4233846(sonoma) Limited bootability for SCSI on RM 6.22 is not working
as documented.
4354225(sonoma) RM6.22 patches 108834 and 108553 causes inability
to boot from A3x00 LUN 0.
4234427(sonoma) Cannot Boot A3500FC devices due to drivers not in OS
releases.
4328575(sonoma) A1000 as a boot device for Ex000: need support matrix.
4191694(tazmo) PCI E450 reports - Fatal SCSI error when booting off of
RAID device RM6.1
4338808(sonoma) When booting from an A1000 the A3500FC luns do not
show up.
4166678(fusion) Initial boot from A1000 (Dilbert) connected to US2D PCI
card fails.
1251360(tazmo) OBP: 875 boot code does not respond to target initiated
WDTR.
4472109(tazmo) A1000 running 03xxxxxx firmware/ rm6.22 will not boot
from E450
Only reboot when specified in the bootability procedures or risk losing
access to your boot device. Following error may occur if the system is
rebooted prematurely:
Rebooting with command: boot -r
Boot device: /sbus@2,0/QLGC,isp@1,10000/sd@5,0:a File and args: -r
SunOS Release 5.7 Version Generic_106541-10 64-bit [UNIX(R) System V
Release 4.0]
Copyright (c) 1983-1999, Sun Microsystems, Inc.
configuring network interfaces: hme0.
Hostname: sonoma40
mount: /dev/dsk/c0t5d0s0 is not this fstype.
failed to open /etc/coreadm.confopen(/dev/.devfseventd_daemon.lock) -
Read-only file system
Configuring /dev and /devices
devfsadmd: mkdir failed for /dev 0x1ed: Read-only file system
devfsadmd: open failed for /dev/.devfsadm_dev.lock: Read-only file system
Configuring the /dev directory (compatibility devices)
The Solaris Operating Environment requires the boot device to be LUN 0.
A RAID level 0 boot device is not supported. Use RAID level 1, 3, or 5
to enable data protection in the drive group containing the boot device.
In the independent controller configuration, only one host system can
boot because only one controller owns LUN 0.
Thefour versions of RM6 that are currently supported when installed
on an A1000, or A3x00 boot device are RM6.1.1 Update 1, RM6.1.1 Update
2, RM6.22 and RM6.22.1.
Solaris Operating Environment versions that are supported when
installed on an A1000, or A3x00 boot device are Solaris 2.6, Solaris 7,
and Solaris 8.
When using Solaris 8, note Update4 and Update5 do not support booting
from the A1000 and A3x000. Earlier and later update releases do support
booting. See bug 4486082 for details and error messages.
4486082(kernel) RM6.22x installation on A3x00 boot device fails on Solaris 8
Update 5.
Upgrading from one version of RM6 to another when booting from an A1000
or A3x00 has not been tested and is currently not supported. Refer to
the RM6 upgrade "guide, 806-7792-14":
806-7792-14 above: Sun StorEdge RAID Manager 6.22 and 6.22.1 Upgrade
Guide .
For more details on upgrading RM6, refer to Early Notifier 20029 for a
list patch levels. Refer to the Best Practice Guide for a list of supported
platforms for booting. For the full list of supported platforms see:
http://acts.ebay/storage/A3x00/HOSTS.html.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following procedures are provided as a guideline for authorized
Enterprise Services Representatives that may be encountering the above
mentioned problem;
There are two procedures. One is for RM6.1.1 versions and the other is
for RM6.22/6.22.1 One must know what version of RM6 they plan on using
prior to installing the Solaris Operating Environment. For the latest
bug fixes and product features, RM6.22.1 is recommended.
A] The procedure for booting Solaris from a StorEdge A1000 or A3x00 Array
with RM6.1.1 u1 or RM6.1.1 u2 only:
1) Install Solaris 2.6, 7, or 8 onto LUN 0 on your hardware RAID
device and let the Solaris Installation program set your eeprom
to boot off your RAID module. After the OS installation, let it
reboot off your RAID module. The default LUN 0 has a capacity of
only 10MB. Please refer to the document "Sun StorEdge RAID
Manager 6.22.1 Release Notes" for instructions on resizing the
default LUN 0. This document also applies to RM6.1.1
configurations.
2) Install the recommended patch cluster for your OS.
3) Install RM6.1.1 Update 1 or RM6.1.1 Update 2
4) Install RM6.1.1 patches (Use patchpro.ebay or EN20029 to determine
necessary patches).
5) Edit the /usr/lib/osa/rmparams file and make Rdac_SupportDisabled=TRUE
6) Perform a reconfiguration reboot (reboot -r).
7) Edit the rmparams file again and make Rdac_SupportDisabled=FALSE
8) Run the command '/etc/init.d/rdacctrl config'
9) Issue the command "df" and take note of what device is mounted
under /
E.g.
# df
/ (/dev/dsk/c0t5d0s0 ):17117372 blocks 1077143 files
/proc (/proc ): 0 blocks 15593 files
/dev/fd (fd ): 0 blocks 0 files
/tmp (swap ): 3724816 blocks 164812 files
10) ls -l /dev/dsk/cAtBdCsD (where cAtBdCsD is the device mounted under /)
E.g.
# ls -l /dev/dsk/c0t5d0s0
lrwxrwxrwx 1 root other /dev/dsk/c0t5d0s0
->../../devices/pseudo/rdnexu
s@0/rdriver@5,0:a
11) Edit the /etc/system file and add the following entries:
rootfs:ufs
rootdev:/pseudo/rdnexus@0/rdriver@5,0:a
NOTE: The rootdev: entry should be the pseudo device path of the device
mounted under /.
12) Perform a reconfiguration reboot (reboot -r).
--- OR ---
B] The procedure for booting Solaris from a StorEdge A1000 or A3x00
Array for RM6.22/6.22.1:
1) Install Solaris 2.6, 7, or 8 to LUN 0 (let suninstall set the default
boot device to be your LUN 0). The default LUN 0 has a capacity of only
10MB. Please refer to the document "Sun StorEdge RAID Manager 6.22.1
Release Notes" for instructions on resizing the default LUN 0.
2) Install the recommended patch cluster for your OS version.
3) Install RM6.22/6.22.1.
4) Install RM6.22/6.22.1 patches (Use patchpro.ebay or EN20029 to
determine necessary patches).
(Warning!: Do not install patches 108553, 108553, 108553
108834, 108834 or 108834. See bug numbers 4388578 and
4354225)
5) Edit the rmparams file for 16 or 32 LUN support if needed.
See FIN I0551.
6) Issue the command "/usr/lib/osa/bin/genscsiconf"
7) Edit sd.conf if needed (see FIN I0551). This is to help speed up
reboots. Otherwise on reboots, the host will timeout for every
non-existent LUN.
8) Run "/etc/init.d/rdacctrl config"
9) Edit the /etc/system file and add the rootfs and rootdev entries.
Refer to steps 9, 10 and 11 of the above procedure "RM6.1.1 u1 &
RM6.1.1 u2 Only"
10) Perform a reconfiguration reboot (reboot -- -r).
--- THEN ---
C] Setting Up and Verifying Alternate Boot Paths (A3X00 only):
1) From the Open Boot Prom, issue the probe-scsi-all command. There
should be a SCSI ID from each RAID controller on the host. Record
this information including the full device path.
2) Boot from the LUN that the OS is installed to and start RM6.
3) Open the Recovery application and select the RAID module that is the
boot device and verify that it's state is Optimal.
4) Select Options -> Manual Recovery -> Controller Pairs.
5) Highlight the controller that owns the LUN that the OS is installed
on and select "Place Offline".
6) When the controller is offline, run a healthcheck or recovery guru.
RM6 should report a data path failure or offline controller. *Do Not*
Fix the problem at the time.
7) Select Module Profile and confirm all LUNs are now owned by the
alternate controller.
8) Bring the host down to the Open Boot Prom.
9) Using the information from the probe-scsi-all from step 1, use the
nvalias command to create an alias to boot from.
10) boot alias -r
11) When the host is up, start RM6. Select the Recovery application,
and select the module that owns your boot device. You should get a
data path failure after running healthcheck or recovery guru. Select
"fix" or place Online. After this is done, run healthcheck or
recovery
guru again and verify that the module is once again Optimal.
12) Open the Maintenance and Tuning application. Select the RAID module
that owns the boot device and select "LUN balancing". Verify that
the
default boot path owns LUN 0.
COMMENTS:
None
----------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the
appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.