Debugging With the Kernel Memory Allocator
The Solaris kernel memory (kmem) allocator provides a powerful set of debugging features that can facilitate analysis of a kernel crash dump. This chapter discusses these debugging features, and the MDB dcmds and walkers designed specifically for the allocator. Bonwick (see "Related Books and Papers") provides an overview of the principles of the allocator itself. Refer to the header file <sys/kmem_impl.h> for the definitions of allocator data structures. The kmem debugging features can be enabled on a production system to enhance problem analysis, or on development systems to aid in debugging kernel software and device drivers.
Note - This guide reflects Solaris 9 implementation; this information might not be relevant, correct, or applicable to past or future releases, since it reflects the current kernel implementation. It does not define a public interface of any kind. All of the information provided about the kernel memory allocator is subject to change in future Solaris releases.
Getting Started: Creating a Sample Crash Dump
This section shows you how to obtain a sample crash dump, and how to invoke MDB in order to examine it.
Setting kmem_flags
The kernel memory allocator contains many advanced debugging features, but these are not enabled by default because they can cause performance degradation. In order to follow the examples in this guide, you should turn on these features. You should enable these features only on a test system, as they can cause performance degradation or expose latent problems.
The allocator's debugging functionality is controlled by the kmem_flags tunable. To get started, make sure kmem_flags is set properly:
# mdb -k > kmem_flags/X kmem_flags: kmem_flags: f |
If kmem_flags is not set to 'f', you should add the line:
set kmem_flags=0xf |
to /etc/system and reboot the system. When the system reboots, confirm that kmem_flags is set to 'f'. Remember to remove your /etc/system modifications before returning this system to production use.
Forcing a Crash Dump
The next step is to make sure crash dumps are properly configured. First, confirm that dumpadm is configured to save kernel crash dumps and that savecore is enabled. See dumpadm(1M) for more information on crash dump parameters.
# dumpadm Dump content: kernel pages Dump device: /dev/dsk/c0t0d0s1 (swap) Savecore directory: /var/crash/testsystem Savecore enabled: yes |
Next, reboot the system using the '-d' flag to reboot(1M), which forces the kernel to panic and save a crash dump.
# reboot -d Sep 28 17:51:18 testsystem reboot: rebooted by root panic[cpu0]/thread=70aacde0: forced crash dump initiated at user request 401fbb10 genunix:uadmin+55c (1, 1, 0, 6d700000, 5, 0) %l0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ... |
When the system reboots, make sure the crash dump succeeded:
$ cd /var/crash/testsystem $ ls bounds unix.0 unix.1 vmcore.0 vmcore.1 |
If the dump is missing from your dump directory, it could be that the partition is out of space. You can free up space and run savecore(1M) manually as root to subsequently save the dump. If your dump directory contains multiple crash dumps, the one you just created will be the unix.[n] and vmcore.[n] pair with the most recent modification time.
Starting MDB
Now, run mdb on the crash dump you created, and check its status:
$ mdb unix.1 vmcore.1 Loading modules: [ unix krtld genunix ip nfs ipc ] > ::status debugging crash dump vmcore.1 (32-bit) from testsystem operating system: 5.9 Generic (sun4u) panic message: forced crash dump initiated at user request |
In the examples presented in this guide, a crash dump from a 32-bit kernel is used. All of the techniques presented here are applicable to a 64-bit kernel, and care has been taken to distinguish pointers (sized differently on 32- and 64-bit systems) from fixed-sized quantities, which are invariant with respect to the kernel data model.
A Sun Ultra-1 workstation was used to generate the example presented. Your results can vary depending on the architecture and model of system you use.
Allocator Basics
The kernel memory allocator's job is to parcel out regions of virtual memory to other kernel subsystems (these are commonly called clients). This section explains the basics of the allocator's operation and introduces some terms used later in this guide.
Buffer States
The functional domain of the kernel memory allocator is the set of buffers of virtual memory that make up the kernel heap. These buffers are grouped together into sets of uniform size and purpose, known as caches. Each cache contains a set of buffers. Some of these buffers are currently free, which means that they have not yet been allocated to any client of the allocator. The remaining buffers are allocated, which means that a pointer to that buffer has been provided to a client of the allocator. If no client of the allocator holds a pointer to an allocated buffer, this buffer is said to be leaked, because it cannot be freed. Leaked buffers indicate incorrect code that is wasting kernel resources.
Transactions
A kmem transaction is a transition on a buffer between the allocated and free states. The allocator can verify that the state of a buffer is valid as part of each transaction. Additionally, the allocator has facilities for logging transactions for post-mortem examination.
Sleeping and Non-Sleeping Allocations
Unlike the Standard C Library's malloc(3C) function, the kernel memory allocator can block (or sleep), waiting until enough virtual memory is available to satisfy the client's request. This is controlled by the 'flag' parameter to kmem_alloc(9F). A call to kmem_alloc(9F) which has the KM_SLEEP flag set can never fail; it will block forever waiting for resources to become available.
Kernel Memory Caches
The kernel memory allocator divides the memory it manages into a set of caches. All allocations are supplied from these caches, which are represented by the kmem_cache_t data structure. Each cache has a fixed buffer size, which represents the maximum allocation size satisfied by that cache. Each cache has a string name indicating the type of data it manages.
Some kernel memory caches are special purpose and are initialized to allocate only a particular kind of data structure. An example of this is the "thread_cache," which allocates only structures of type kthread_t. Memory from these caches is allocated to clients by the kmem_cache_alloc() function and freed by the kmem_cache_free() function.
Note - kmem_cache_alloc() and kmem_cache_free() are not public DDI interfaces. Do NOT write code that relies on them, because they are subject to change or removal in future releases of Solaris.
Caches whose name begins with "kmem_alloc_" implement the kernel's general memory allocation scheme. These caches provide memory to clients of kmem_alloc(9F) and kmem_zalloc(9F). Each of these caches satisfies requests whose size is between the buffer size of that cache and the buffer size of the next smallest cache. For example, the kernel has kmem_alloc_8 and kmem_alloc_16 caches. In this case, the kmem_alloc_16 cache handles all client requests for 9-16 bytes of memory. Remember that the size of each buffer in the kmem_alloc_16 cache is 16 bytes, regardless of the size of the client request. In a 14 byte request, two bytes of the resulting buffer are unused, since the request is satisfied from the kmem_alloc_16 cache.
The last set of caches are those used internally by the kernel memory allocator for its own bookkeeping. These include those caches whose names start with "kmem_magazine_" or "kmem_va_", the kmem_slab_cache, the kmem_bufctl_cache and others.
Kernel Memory Caches
This section explains how to find and examine kernel memory caches. You can learn about the various kmem caches on the system by issuing the ::kmastat command.
> ::kmastat cache buf buf buf memory alloc alloc name size in use total in use succeed fail ------------------------- ------ ------ ------ --------- --------- ----- kmem_magazine_1 8 24 1020 8192 24 0 kmem_magazine_3 16 141 510 8192 141 0 kmem_magazine_7 32 96 255 8192 96 0 ... kmem_alloc_8 8 3614 3751 90112 9834113 0 kmem_alloc_16 16 2781 3072 98304 8278603 0 kmem_alloc_24 24 517 612 24576 680537 0 kmem_alloc_32 32 398 510 24576 903214 0 kmem_alloc_40 40 482 584 32768 672089 0 ... thread_cache 368 107 126 49152 669881 0 lwp_cache 576 107 117 73728 182 0 turnstile_cache 36 149 292 16384 670506 0 cred_cache 96 6 73 8192 2677787 0 ... |
If you run ::kmastat you get a feel for what a "normal" system looks like. This will help you to spot excessively large caches on systems that are leaking memory. The results of ::kmastat will vary depending on the system you are running on, how many processes are running, and so forth.
Another way to list the various kmem caches is with the ::kmem_cache command:
> ::kmem_cache ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL 70036028 kmem_magazine_1 0020 0e0000 8 1020 700362a8 kmem_magazine_3 0020 0e0000 16 510 70036528 kmem_magazine_7 0020 0e0000 32 255 ... 70039428 kmem_alloc_8 020f 000000 8 3751 700396a8 kmem_alloc_16 020f 000000 16 3072 70039928 kmem_alloc_24 020f 000000 24 612 70039ba8 kmem_alloc_32 020f 000000 32 510 7003a028 kmem_alloc_40 020f 000000 40 584 ... |
This command is useful because it maps cache names to addresses, and provides the debugging flags for each cache in the FLAG column. It is important to understand that the allocator's selection of debugging features is derived on a per-cache basis from this set of flags. These are set in conjunction with the global kmem_flags variable at cache creation time. Setting kmem_flags while the system is running has no effect on the debugging behavior, except for subsequently created caches (which is rare after boot-up).
Next, walk the list of kmem caches directly using MDB's kmem_cache walker:
> ::walk kmem_cache 70036028 700362a8 70036528 700367a8 ... |