InfoDoc ID |
|
Synopsis |
|
Date |
46780 |
|
Instructions on how to gather data from a hung Sun Fire[TM] domain. |
|
13 Dec 2002 |
Instructions on how to gather data from a hung Sun Fire[TM] domain.
1. Ensure that the domain is actually hung:
- Can you ping the domain?
- Can you telnet to the domain?
2. Ensure that the SC (System Controller) is not hung, login to the SC and
obtain a platform shell.
A. If you get to the platform shell run the following commands:
SCname:SC> showlogs
SCname:SC> showplatform
B. If the SC is hung try the secondary SC (if available)
Since only the primary SC can have access to the domains,
this leaves us with little debugging options.
- obtain the output of a showlogs
- if possible (depends on the SC firmware) issue a reset
to the primary SC.
C. If the SC is still hung then the reset button on the
primary SC must be pushed. Then go back to step 2A.
3. Once in the platform shell attempt to get a domain shell:
console -d <domainID>
- If the command appears to hang then we need to send a break signal to
the domain.
- if you are using telnet:
Press: CTRL ]
at the telnet prompt type: send break
- if you are connected to the SC via tip:
~#
At this point you should have a domain shell prompt, continue with the
following commands, otherwise continue to step 4.
- If you get the domain shell run the following commands:
SCname:A> showdomain -p status
SCname:A> showlogs
Then type break to get to the OBP. if this takes you to the ok prompt
then type sync to force a core file.
4. If you were not able to get to the ok prompt then the system is really
hung and we will need to send and XIR (externally initiated reset) to
the domain.
From the domain shell type: reset
This command will give different behavior depending on what the OBP
variable error-reset-recovery is set to. If this variable is set to
sync a core file will attempt to be taken. If it is set to boot then
the system will just reboot as if the boot command was issued at the
ok prompt. If it is set to none it should drop you to the ok prompt,
where you can run the following commands, the '#' sign represents the
cpu that we took the XIR on, use that number in the cbuf command if
possible run this command on each of the cpus (some depend on
firmware level of the SC):
{#} ok dump-sigblock
{#} ok # cbuf
{#} ok .xir-state-all
- If you were not able to return to the ok prompt, but have a domain
prompt type the following command:
SCname:A> showresetstate
5. If none of these tactics work you may be forced in to just powering
off the domain. If this is the case then do a setkeyswitch off for the
domain.
Keywords: StarCat, Star, Cat, SC, SunFire, kernel, XIR
INTERNAL SUMMARY:
Author: Christine Perrigo
Kernel Technical Support Engineer
Sun Enterprise Services
(MS) UBRM04-125
500 Eldorado Blvd.
Broomfield, CO. 80021
E-mail: christine.perrigo@sun.com
Phone: 303.464.4521
SUBMITTER: Chris Wagner
APPLIES TO: AFO Vertical Team Docs/Kernel
ATTACHMENTS:
Copyright (c) 1997-2003 Sun Microsystems, Inc.