Sun Microsystems, Inc.
spacerspacer
spacer www.sun.com docs.sun.com |
spacer
black dot
 
 
18.  Debugging The kadb Kernel Debugger kadbMacros Thread Macros  Previous   Contents   Next 
   
 

Another useful macro is thread. Given a thread ID, this macro prints the corresponding thread structure. This can be used to look at a certain thread found with the threadlist macro, to look at the owner of a mutex, or to look at the current thread, as shown here:


Example 18-4 The thread Macro

kadb[0]: <g7$<thread
70e87ac0:       link            stk             startpc
                0               4026bc80        0
70e87acc:       bound_cpu       affinitycnt     bind_cpu
                0               0               -1
70e87ad4:       flag    proc_flag       schedflag
                0       4               3
70e87ada:       preempt preempt_lk      state
                0       0               4
70e87ae0:       pri     epri
                40      0
70e87ae4:
                pc              sp
                10098350        4026b618
70e87aec:       wchan0          wchan           sobj_ops
                0               0               0
70e87af8:       cid             clfuncs         cldata
                1               10470ffc        702c0488
70e87b04:       ctx             lofault         onfault
                0               0               0
...


Note - No type information is maintained by kadb, so using a macro on an inappropriate address results in garbage output.


Macros do not necessarily output all the fields of the structures, nor is the output necessarily in the order given in the structure definition. Occasionally, memory needs to be dumped for certain structures and then matched with the structure definition in the kernel header files.


Caution - Drivers should never reference system header files or structures not listed in man pages section 9S: DDI and DKI Data Structures. However, examining non-DDI-compliant structures (such as thread structures) can be useful in debugging drivers.


kadb Output Pager

Some kadb commands (like $<threadlist) output lots of data, which can scroll off of the screen very rapidly. kadb provides a simple output pager to remedy this problem. The pager command is lines::more, where lines represents the number of lines to print before pausing the console output. Keep in mind that this does not take into account lines that wrap because they are wider than the terminal width. Here is an example usage:

kadb[0]: 0t10::more
kadb[0]: $<threadlist 

                ============== thread_id        10408000
p0+0x4c0:
                process args    sched

t0+0x128:       lwp             procp           wchan
            10429ed0         104393e8         0
t0+0x38:
                pc              sp
                sched+0x4e4     104071f1
?(10408000,10414c00,2,104393e8,10439308,0)
_start(10007588,104292e0,104292e0,104292e0,1043b8b0,10429360) + 200

                ============== thread_id        2a10001fd40
p0+0x4c0:
                process args    sched
--More-- <SPACE>
         
...

Pressing the space bar at the "--More--" prompt pages the output by the number of lines specified to ::more (in this case, 10). Pressing "Return" prints only the next line of output. You can abort the output and return to the kadb prompt by typing Ctrl-C. To disable the pager, issue '0::more' at the kadb prompt.

Example: kadb on a Deadlocked Thread

This example shows how kadb can be used to debug a driver bug. This example was taken from the development of the ramdisk sample driver. This driver exports physical memory as a virtual disk. In this case, the dd(1M) command hangs while trying to copy some data onto the device and cannot be aborted. Though a crash dump could be forced, for illustrative purposes, kadb(1M) will be used. After logging into the system remotely, ps was used to determine that the system was still running; and only the dd(1M) command is hung.

At this point, the system is rebooted with kadb, which can now be entered by typing STOP-A on the system console. After the rest of the kernel has loaded, moddebug is patched to see if loading is the problem:

stopped at:
edd000d8:       ta      %icc,%g0 + 125
kadb[0]: moddebug/X
moddebug:
moddebug:       0
kadb[0]: moddebug/W 0x80000000
moddebug:       0x0             =       0x80000000
kadb[0]: :c

modload(1M) is used to load the driver, to separate module loading from the real access:

# modload /home/driver/drv/ramdisk

It loads without errors, so loading is not the problem. The condition is recreated with dd(1M):

# dd if=/dev/zero of=/devices/pseudo/ramdisk@0:c,raw

dd(1M) hangs. At this point, kadb(1M) is entered and the stack examined:

stopped at:
edd000d8:       ta      %icc,%g0 + 125
kadb[0]: $c
intr_vector() + 7dcfc0d8
debug_enter(0,0,10431e50,10,1,b0) + 78
zsa_xsint(80,7044a06c,44,7044a000,ff0113,0) + 278
zs_high_intr(7044a000,1,1,1042f78c,10424680,100949d0) + 20c
sbus_intr_wrapper(704dfad4,0,702bd048,7029cec0,630,10260250) + 30
current_thread(4001fe60,1041a550,10424698,10424698,10150f08,0) + 180
idle(1040b6c0,0,0,1041a550,704d6a98,0) + 54
thread_start(0,0,0,0,0,0) + 4

The presence of idle on the current thread stack indicates that this thread is not the cause of the deadlock. To determine the deadlocked thread, the entire thread list is checked:

kadb[0]: $<threadlist
...
                ============== thread_id        70cef120
70c8b1c0:
                process args    dd if=/dev/zero of=/devices/pseudo/ramdisk@0:c,raw

70cef1c8:       lwp             procp           wchan
                70fa9080        70c8aec0        70691fc8
70cef144:
                pc              sp
                sema_p+0x290    40313a78
?(70691fc8,10424680,1,1042b99c,10460f8c,70691fc8)
biowait(70691f60,1041a6c4,70691f60,70c385d0,40313bcc,705c73a0) + 8c
default_physio(1042e8fc,200,129,100,70eb5b54,705c73a0) + 3bc
write(2002,70aac1d0,70f9f9ac,200,4,200) + 23c
...

Of all the threads, only one has a stack trace which references the ramdisk driver. It seems that the process running dd(1M) is blocked in biowait(9F). biowait(9F)'s first parameter is a buf(9S) structure. The next step is to examine this structure:

kadb[0]:  70691f60$70691f60$
70691f60:       flags           forw            back
                204129          0               0
70691f6c:       av_forw         av_back         bcount
                0               0               512
70691fa0:       bufsize         error           edev
                0               0               1180000
70691f7c:       un.b_addr       _b_blkno        resid
                710e8000        0                0
70691f94:       proc            iodone          vp
                70c8aec0        0               0
70691f98:       pages
                0

The resid field is 0, which indicates that the transfer is complete. physio(9F) is still blocked, however. The reference for physio(9F) in the Solaris 9 Reference Manual Collection points out that biodone(9F) should be called to unblock biowait(9F). This is the problem; rd_strategy() did not call biodone(9F). Adding a call to biodone(9F) before returning fixes this problem.

 
 
 
  Previous   Contents   Next