Sun Microsystems, Inc.
spacerspacer
spacer www.sun.com docs.sun.com |
spacer
black dot
 
 
18.  Debugging Machine Configuration Preparing for Disasters Prepare Other Backup Plans  Previous   Contents   Next 
   
 

Saving System Crash Dumps

When the system panics, it writes an image of kernel memory to the dump device. The dump device by default is the most suitable swap device. The dump is a system crash dump, similar to core dumps generated by applications. On rebooting after a panic, savecore(1M) checks the dump device for a crash dump. If one is found, it makes a copy of the kernel's symbol table (called unix.n) and dumps a core file (called vmcore.n) in the core image directory which by default is /var/crash/machine_name. There must be enough space in /var/crash to contain the core dump or it will be truncated. mdb(1) can then be used on the core dump and the saved kernel.

In the Solaris 9 operating system, crash dump is enabled by default. The dumpadm(1M) command is used to configure system crash dumps. Use the dumpadm(1M) command to verify that crash dumps are enabled and to determine the location of the directory where core files are saved. See the dumpadm(1M) man page for more information.


Note - savecore(1M) can be prevented from filling the file system if there is a file called minfree in the directory in which the dump will be saved. This file contains a number of kilobytes to remain free after savecore(1M) has run. However, if not enough space is available, the core file is not saved.


Disaster Recovery

If the /devices or /dev directories are damaged--most likely to occur if the driver crashes during attach(9E)--they can be re-created by booting the system and running fsck(1M) to repair the damaged root file system. The root file system can then be mounted. Re-create /dev and /devices by running devfsadm(1M) and specifying the /devices directory on the mounted disk.

On SPARC, for example, if the damaged disk is /dev/dsk/c0t3d0s0, and an alternate boot disk is /dev/dsk/c0t1d0s0, do the following:

ok boot disk1
...
Rebooting with command: boot kernel.test/unix
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@31,0:a  File and args:
kernel/unix
SunOS Release 5.9 Version Generic 32-bit
Copyright 1983-2002 Sun Microsystems, Inc.  All rights reserved.
...
# fsck /dev/dsk/c0t3d0s0** /dev/dsk/c0t3d0s0
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1478 files, 9922 used, 29261 free
     (141 frags, 3640 blocks, 0.4% fragmentation)
# mount /dev/dsk/c0t3d0s0 /mnt
# devfsadm -r /mnt

Caution - Fixing /devices and /dev may allow the system to boot, but other parts of the system can still be corrupted. This is only a temporary fix to allow saving information (such as system crash dumps) before reinstalling the system.


Runtime Debugging Tools

This section describes some of the mechanisms that can be used to debug drivers at runtime. Runtime debugging is typically performed during driver development; this process is substantially simplified if you have followed the coding practices described in the previous section. Although the kadb debugger is a runtime debugging tool, it is treated at length in a separate section, "The kadb Kernel Debugger".

/etc/system File

The /etc/system file serves several purposes, but for driver development, the most important is that it allows you to set the value of kernel variables at boot time. This can be used to toggle different behaviors in a driver, or to enable certain debugging features made available by the kernel.

/etc/system is read only once, while the kernel is booting. After this file is modified, the system must be rebooted for the changes to take effect. If a change in the file causes the system not to work, boot with the ask (-a) option and specify /dev/null as the system file.

The set command is used to change the value of module or kernel variables:

  • To set module variables, specify the module name and the variable:

            set module_name:variable=value

    For example, to set the variable test_debug in the driver test, use the following set command:

            set test:test_debug=1
  • To set a variable exported by the kernel itself, omit the module name. Other assignments are also supported, such as bitwise OR'ing a value into an existing value:

            set moddebug | 0x80000000

See the system(4) man page for more information.


Note - Most kernel variables are not guaranteed to be present in subsequent releases.


Controlling Module Loading with moddebug

moddebug is a kernel variable that controls the module loading process. The possible values are:

0x80000000

Prints messages to the console when loading or unloading modules.

0x40000000

Gives more detailed error messages.

0x20000000

Prints more detail when loading or unloading (such as including the address and size).

0x00001000

No auto-unloading drivers: the system will not attempt to unload the device driver when the system resources become low.

0x00000080

No auto-unloading streams: the system will not attempt to unload the streams module when the system resources become low.

0x00000010

No auto-unloading of kernel modules of any type.

0x00000001

If running with kadb, moddebug causes a breakpoint to be executed and a return to kadb immediately before each module's _init(9E) routine is called. Also generates additional debug messages when the module's _info and _fini routines are executed.

kmem_flags

kmem_flags is a kernel variable used to enable debugging features in the kernel's memory allocator. Setting kmem_flags to 0xf enables the allocator's debugging features. These include runtime checks to find:

  • Code that writes to a buffer after it is freed

  • Code using memory before it is initialized

  • Code that writes past the end of a buffer

The "Debugging With the Kernel Memory Allocator" section of the Solaris Modular Debugger Guide describes how the kernel memory allocator can be used to determine the root cause of these problems.


Note - Testing and developing with kmem_flags set to 0xf is extremely valuable because it can detect latent memory corruption bugs. Because setting kmem_flags to 0xf changes the internal behavior of the kernel memory allocator, you should thoroughly test without kmem_flags as well.


modload, modunload, and modinfo Commands

The kernel automatically loads needed modules and unloads unused ones, so modload(1M), modunload(1M), and modinfo(1M) are not very useful for system administration. However, they can be useful when debugging and stress testing driver load/unload scenarios.

modload can be used to force a module into memory. The kernel might subsequently unload the module, but modload can be used to verify that the driver has no unresolved references when loaded. Keep in mind that loading a driver does not mean that the driver will attach. A driver that loads successfully will have its _info(9E) entrypoint called, but will not necessarily attach.

You can use modinfo to confirm that your driver is loaded. Here is an example:

$ modinfo
 Id Loadaddr   Size Info Rev Module Name
  6 101b6000    732   -   1  obpsym (OBP symbol callbacks)
  7 101b65bd  1acd0 226   1  rpcmod (RPC syscall)
  7 101b65bd  1acd0 226   1  rpcmod (32-bit RPC syscall)
  7 101b65bd  1acd0   1   1  rpcmod (rpc interface str mod)
  8 101ce8dd  74600   0   1  ip (IP Streams module)
  8 101ce8dd  74600   3   1  ip (IP Streams device)
...

$ modinfo | grep mydriver
169 781a8d78   13fb   0   1  mydriver (Test Driver 1.5)

The number in the info field is the major number chosen for the driver. modunload can be used to unload a module, given a module ID (which can be found in the leftmost column of modinfo output). A common bug is that a driver refuses to unload, even after a modunload is issued. Note that a driver will not unload if the system thinks the driver is busy. This occurs when the driver fails detach(9E), either because the driver really is busy, or because the detach entry point is implemented incorrectly.

To remove all currently unused modules from memory, run modunload with a module ID of 0:

# modunload -i 0

The kadb Kernel Debugger

kadb is a kernel debugger with facilities for disassembly, breakpoints, watch points, data display, and stack tracing. This section provides a tutorial on some of the features of kadb. For further information, consult the kadb(1M) man page.

Starting kadb

In order to start up kadb, the system must be booted with kadb(1M) enabled:

ok boot kadb
...
Rebooting with command: boot kadb
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a
File and args: kadb
kadb: kernel/sparcv9/unix
Size: 499808+109993+132503 Bytes
/platform/sun4u/kernel/sparcv9/unix loaded - 0x11e000 bytes used
SunOS Release 5.9 Version Generic 64-bit
Copyright 1983-2002 Sun Microsystems, Inc.  All rights reserved
....

By default, kadb(1M) boots (and debugs) kernel/unix, or kernel/sparcv9/unix on a system capable of running a 64-bit kernel. To boot kadb with an alternate kernel, pass the -D flag to boot, as follows:

ok boot kadb -D kernel.test/unix
...
Rebooting with command: boot kadb -D kernel.test/unix
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a  File
and args: kadb -D kernel.test/unix
kadb: kernel.test/unix
Size: 482384+67201+88883 Bytes
/platform/sun4u/kernel.test/unix loaded - 0xfe000 bytes used
SunOS Release 5.9 Version dacf-fixes:11/13/99 32-bit
Copyright 1983-2002 Sun Microsystems, Inc.  All rights reserved.
...
 
 
 
  Previous   Contents   Next