SRDB ID   Synopsis   Date
48138   Sun Fire[TM] 15K: Identifying and recovering from a domain hang   29 Oct 2002

Status Issued

Description
How to identify and recover from a hung domain.                  

SOLUTION SUMMARY:
There are several tools to use when trying to
determine if a domain is hung.  If a domain doesn't
respond to the following commands from the System
Controller, that is a good indication that the
domain is hung.

sc0:sms-svc:1> ping domain-a 
sc0:sms-svc:2> telnet domain-a
sc0:sms-svc:3> console -d a

Now that you have established the likelihood that the domain is hung, the following steps can be used to return to a 'Running Solaris' state.

1. From the System Controller, connect to the domain
through the console command.

sc0:sms-svc:1> console -d a

Even though there won't be any response or activity
we can still send a break sequence (~#) that will
drop the OS to the OK> prompt, effectively a Stop-A.
Once at the OK> prompt the sync command will try to
generate a system dump file and reboot the domain.

2. If a break sequence at the console is insufficient to regain control of the domain, the
reset command from the System Contoller can be
tried.  This is hard on the OS and will more than
likely require a fsck to boot the domain into
multiuser mode.

sc0:sms-svc:1> reset -d a 

If that still doesn't work, try:

sc0:sms-svc:2> reset -d a -x

It may take several seconds for the OK> prompt to
appear after issuing this command.  Once at the OBP, be sure to bring the domain back up with the sync
command so that a core file might be generated.

3. Finally, try the keyswitch.

sc0:sms-svc:1> setkeyswitch -d a off
sc0:sms-svc:2> setkeyswitch -d a on

However, this will prevent a core file being generated.            

INTERNAL SUMMARY:

I

SUBMITTER: Ryan Crapo APPLIES TO: Hardware/Sun Fire /15000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.