InfoDoc ID | Synopsis | Date | ||
40430 | Rebuilding a T3 sysarea | 5 Oct 2001 |
Status | Issued |
Description |
In cases where the t3 is able to boot but unable to mount the / file system, it is necessary in some cases, to wipe out and rebuild the sysarea. The process to wipe out and rebuild the t3 sysarea is completed in 3 stages.
Stage 1 wipe out the corrupt sysarea
Stage 2 rebuild a new sysarea
Stage 3 Lun recovery
Each of the 9 disks in the t3 has a 200MB region reserved for the boot code and configuration data. The configuration data includes the bootcode, volume configuration and logging data. The system area has 2 partitions. Bootcode is located on the first and psos+ and a filesystem make up the second. In this case we assume that the bootcode is good but the filesystem is damaged. The only way to repair the filesystem in this case is to wipe out and rebuild the system area..
Stage 1 wipe out the corrupt sysarea
In order to wipe out the sysarea on a disk we interrupt the boot process by hitting ctrl t as soon as the the message
initializing QLCF component...
initializing loop 1 ISP2100 ... firmware status = 3
Detected 10 FC-AL ports on loop 1
Initializing loop 2 ISP2100 ... firmware status = 3
The boot will continue in a normal fashion until it finally enters offline diagnostics.
Cache Mem Addr Toggle Test begin...
Cache Mem Addr Toggle Test complete... Passed
256 MBytes Cache Memory Detected
Testing CPU DRAM... Cancelled
Once the Diagnostic menu appears Select the QC: to quit but go into Label Control Menu
DIAGNOSTICS MENU
CO: Configure options for Diagnosis
MM: Memory Diagnostic Menu
DM: Data Path Diagnostic Menu
XM: Xor Diagnostic Menu
IM: QLOGIC ISP2100 Chip Diagnostic Menu
RS: Reset System
QC: Quit but go into Label Control Menu
QT: Quit Diagnostic Menu
Enter command [HE]: qc
Once again select the QC: option to quit and enter the label control menu.
DIAGNOSTICS MENU
CO: Configure options for Diagnosis
MM: Memory Diagnostic Menu
DM: Data Path Diagnostic Menu
XM: Xor Diagnostic Menu
IM: QLOGIC ISP2100 Chip Diagnostic Menu
RS: Reset System
QC: Quit but go into Label Control Menu
QT: Quit Diagnostic Menu
Enter command [HE]: qc
Select the W1 option. This will wipe out unit 1 Sysarea and LFS
LABEL CONTROL MENU
W1: Wipe out unit 1 Sysarea and LFS
QQ: Quit Label Menu for this UNIT
QA: Quit All
Enter command [HE]: w1
Select qa to quit and resume boot proccess.
Enter command [HE]: qa
Stage 2 rebuild a new sysarea
The rebuild of the sysarea will require a tftp boot server. The process is documenetd in Sun infodoc 19272. Once the tftpboot server is ready, tftpboot the unit.
1.interrupt boot proccess (Hit any key before the timeout value)
2.set bootmode tftp
3.boot
Once the boot completes the system is ready to be rebuilt.
Now use the t3.sh script (available in the 109115 patch) to reload psos.
Follow the patch readme included with 109115.
Stage 3 Lun recovery
Use output from a saved copy of extractor or explorer to identify the lun configuration raid level and block size. If any of these things are not rebuilt exactly as they were, the data is lost..
block size can be verified with output from:
sys_list
blocksize : 64k
lun make up can be verified with output from:
vol_list
brm04-storage-lab4:/:<6>vol list
volume capacity raid data standby
vol01 35.8 GB 1 u2d1-4 u2d9
After the layout is determined, reset the block size to the correct value.
Notice that the luns are now missing after the rebuild of the boot and sysareas
brm04-storage-lab4:/:<39>vol list
volume capacity raid data standby
1.sys blocksize 64k (Sets the block size to 64k as indicated by extractor output)
2.vol add <name> data <drives> raid <0 | 1 | 5> [standby <drive>]
in this specific case add back vol01 by:
vol add vol01 data u2d1-4 raid 1 standby u2d9
Check the lun against the saved output.
brm04-storage-lab4:/:<45>vol list
volume capacity raid data standby
vol01 35.8 GB 1 u2d1-4 u2d9
Now we can re-initialize the lun. This is done with the .vol command so that the volume data is not actually initialized.
In our case
:brm04-storage-lab4:/:<53>.vol init vol01 fast
WARNING - Existing volume data won't be changed.
Continue ? [N]: y
Now the volume(s) can be remounted and data can be brought back online. Stage 3 is particularly critical. All data will be lost if any incorrect information is used in the lun rebuild stage.
INTERNAL SUMMARY:
SUBMITTER: Mike Monahan APPLIES TO: Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge T3 ATTACHMENTS: