InfoDoc ID   Synopsis   Date
40430   Rebuilding a T3 sysarea   5 Oct 2001

Status Issued

Description

In cases where the t3 is able to boot but unable to mount the / file system, it is necessary in some cases, to wipe out and rebuild the sysarea. The process to wipe out and rebuild the t3 sysarea is completed in 3 stages.

Stage 1 wipe out the corrupt sysarea

Stage 2 rebuild a new sysarea

Stage 3 Lun recovery

Each of the 9 disks in the t3 has a 200MB region reserved for the boot code and configuration data. The configuration data includes the bootcode, volume configuration and logging data. The system area has 2 partitions. Bootcode is located on the first and psos+ and a filesystem make up the second. In this case we assume that the bootcode is good but the filesystem is damaged. The only way to repair the filesystem in this case is to wipe out and rebuild the system area..

Stage 1 wipe out the corrupt sysarea

In order to wipe out the sysarea on a disk we interrupt the boot process by hitting ctrl t as soon as the the message

initializing QLCF component...

initializing loop 1 ISP2100 ... firmware status = 3

Detected 10 FC-AL ports on loop 1

Initializing loop 2 ISP2100 ... firmware status = 3

The boot will continue in a normal fashion until it finally enters offline diagnostics.

Cache Mem Addr Toggle Test begin...

Cache Mem Addr Toggle Test complete... Passed

256 MBytes Cache Memory Detected

Testing CPU DRAM... Cancelled

Once the Diagnostic menu appears Select the QC: to quit but go into Label Control Menu

DIAGNOSTICS MENU

CO: Configure options for Diagnosis

MM: Memory Diagnostic Menu

DM: Data Path Diagnostic Menu

XM: Xor Diagnostic Menu

IM: QLOGIC ISP2100 Chip Diagnostic Menu

RS: Reset System

QC: Quit but go into Label Control Menu

QT: Quit Diagnostic Menu

Enter command [HE]: qc

Once again select the QC: option to quit and enter the label control menu.

DIAGNOSTICS MENU

CO: Configure options for Diagnosis

MM: Memory Diagnostic Menu

DM: Data Path Diagnostic Menu

XM: Xor Diagnostic Menu

IM: QLOGIC ISP2100 Chip Diagnostic Menu

RS: Reset System

QC: Quit but go into Label Control Menu

QT: Quit Diagnostic Menu

Enter command [HE]: qc

Select the W1 option. This will wipe out unit 1 Sysarea and LFS

LABEL CONTROL MENU

W1: Wipe out unit 1 Sysarea and LFS

QQ: Quit Label Menu for this UNIT

QA: Quit All

Enter command [HE]: w1

Select qa to quit and resume boot proccess.

Enter command [HE]: qa

Stage 2 rebuild a new sysarea

The rebuild of the sysarea will require a tftp boot server. The process is documenetd in Sun infodoc 19272. Once the tftpboot server is ready, tftpboot the unit.

1.interrupt boot proccess (Hit any key before the timeout value)

2.set bootmode tftp

3.boot

Once the boot completes the system is ready to be rebuilt.

Now use the t3.sh script (available in the 109115 patch) to reload psos.

Follow the patch readme included with 109115.

Stage 3 Lun recovery

Use output from a saved copy of extractor or explorer to identify the lun configuration raid level and block size. If any of these things are not rebuilt exactly as they were, the data is lost..

block size can be verified with output from:

sys_list

blocksize : 64k

lun make up can be verified with output from:

vol_list

brm04-storage-lab4:/:<6>vol list

volume capacity raid data standby

vol01 35.8 GB 1 u2d1-4 u2d9

After the layout is determined, reset the block size to the correct value.

Notice that the luns are now missing after the rebuild of the boot and sysareas

brm04-storage-lab4:/:<39>vol list

volume capacity raid data standby

1.sys blocksize 64k (Sets the block size to 64k as indicated by extractor output)

2.vol add <name> data <drives> raid <0 | 1 | 5> [standby <drive>]

in this specific case add back vol01 by:

vol add vol01 data u2d1-4 raid 1 standby u2d9

Check the lun against the saved output.

brm04-storage-lab4:/:<45>vol list

volume capacity raid data standby

vol01 35.8 GB 1 u2d1-4 u2d9

Now we can re-initialize the lun. This is done with the .vol command so that the volume data is not actually initialized.

In our case

:brm04-storage-lab4:/:<53>.vol init vol01 fast

WARNING - Existing volume data won't be changed.

Continue ? [N]: y

Now the volume(s) can be remounted and data can be brought back online. Stage 3 is particularly critical. All data will be lost if any incorrect information is used in the lun rebuild stage.

INTERNAL SUMMARY:

SUBMITTER: Mike Monahan APPLIES TO: Hardware/Disk Storage Subsystem/StorEdge Disk Array/StorEdge T3 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.