Chapter 8

Debugging With the Kernel Memory Allocator

The Solaris kernel memory (kmem) allocator provides a powerful set of debugging features that can facilitate analysis of a kernel crash dump. This chapter discusses these debugging features, and the MDB dcmds and walkers designed specifically for the allocator. Bonwick (see "Related Books and Papers") provides an overview of the principles of the allocator itself. Refer to the header file <sys/kmem_impl.h> for the definitions of allocator data structures. The kmem debugging features can be enabled on a production system to enhance problem analysis, or on development systems to aid in debugging kernel software and device drivers.

Note - This guide reflects Solaris 9 implementation; this information might not be relevant, correct, or applicable to past or future releases, since it reflects the current kernel implementation. It does not define a public interface of any kind. All of the information provided about the kernel memory allocator is subject to change in future Solaris releases.

Getting Started: Creating a Sample Crash Dump

This section shows you how to obtain a sample crash dump, and how to invoke MDB in order to examine it.

Setting `kmem_flags`

The kernel memory allocator contains many advanced debugging features, but these are not enabled by default because they can cause performance degradation. In order to follow the examples in this guide, you should turn on these features. You should enable these features only on a test system, as they can cause performance degradation or expose latent problems.

The allocator's debugging functionality is controlled by the kmem_flags tunable. To get started, make sure kmem_flags is set properly:

# mdb -k
> kmem_flags/X
kmem_flags:
kmem_flags:     f

If kmem_flags is not set to 'f', you should add the line:

set kmem_flags=0xf

to /etc/system and reboot the system. When the system reboots, confirm that kmem_flags is set to 'f'. Remember to remove your /etc/system modifications before returning this system to production use.

Forcing a Crash Dump

The next step is to make sure crash dumps are properly configured. First, confirm that dumpadm is configured to save kernel crash dumps and that savecore is enabled. See dumpadm(1M) for more information on crash dump parameters.

# dumpadm
		      Dump content: kernel pages
		       Dump device: /dev/dsk/c0t0d0s1 (swap)
		Savecore directory: /var/crash/testsystem
		  Savecore enabled: yes

Next, reboot the system using the '-d' flag to reboot(1M), which forces the kernel to panic and save a crash dump.

# reboot -d
Sep 28 17:51:18 testsystem reboot: rebooted by root

panic[cpu0]/thread=70aacde0: forced crash dump initiated at user request

401fbb10 genunix:uadmin+55c (1, 1, 0, 6d700000, 5, 0)
  %l0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
         00000000
...

When the system reboots, make sure the crash dump succeeded:

$ cd /var/crash/testsystem
$ ls
bounds    unix.0    unix.1    vmcore.0  vmcore.1

If the dump is missing from your dump directory, it could be that the partition is out of space. You can free up space and run savecore(1M) manually as root to subsequently save the dump. If your dump directory contains multiple crash dumps, the one you just created will be the unix.[n] and vmcore.[n] pair with the most recent modification time.

Starting MDB

Now, run mdb on the crash dump you created, and check its status:

$ mdb unix.1 vmcore.1
Loading modules: [ unix krtld genunix ip nfs ipc ]
> ::status
debugging crash dump vmcore.1 (32-bit) from testsystem
operating system: 5.9 Generic (sun4u)
panic message: forced crash dump initiated at user request

In the examples presented in this guide, a crash dump from a 32-bit kernel is used. All of the techniques presented here are applicable to a 64-bit kernel, and care has been taken to distinguish pointers (sized differently on 32- and 64-bit systems) from fixed-sized quantities, which are invariant with respect to the kernel data model.

A Sun Ultra-1 workstation was used to generate the example presented. Your results can vary depending on the architecture and model of system you use.

Allocator Basics

The kernel memory allocator's job is to parcel out regions of virtual memory to other kernel subsystems (these are commonly called clients). This section explains the basics of the allocator's operation and introduces some terms used later in this guide.

Buffer States

The functional domain of the kernel memory allocator is the set of buffers of virtual memory that make up the kernel heap. These buffers are grouped together into sets of uniform size and purpose, known as caches. Each cache contains a set of buffers. Some of these buffers are currently free, which means that they have not yet been allocated to any client of the allocator. The remaining buffers are allocated, which means that a pointer to that buffer has been provided to a client of the allocator. If no client of the allocator holds a pointer to an allocated buffer, this buffer is said to be leaked, because it cannot be freed. Leaked buffers indicate incorrect code that is wasting kernel resources.

Transactions

A kmem transaction is a transition on a buffer between the allocated and free states. The allocator can verify that the state of a buffer is valid as part of each transaction. Additionally, the allocator has facilities for logging transactions for post-mortem examination.

Sleeping and Non-Sleeping Allocations

Unlike the Standard C Library's malloc(3C) function, the kernel memory allocator can block (or sleep), waiting until enough virtual memory is available to satisfy the client's request. This is controlled by the 'flag' parameter to kmem_alloc(9F). A call to kmem_alloc(9F) which has the KM_SLEEP flag set can never fail; it will block forever waiting for resources to become available.

Kernel Memory Caches

The kernel memory allocator divides the memory it manages into a set of caches. All allocations are supplied from these caches, which are represented by the kmem_cache_t data structure. Each cache has a fixed buffer size, which represents the maximum allocation size satisfied by that cache. Each cache has a string name indicating the type of data it manages.

Some kernel memory caches are special purpose and are initialized to allocate only a particular kind of data structure. An example of this is the "thread_cache," which allocates only structures of type kthread_t. Memory from these caches is allocated to clients by the kmem_cache_alloc() function and freed by the kmem_cache_free() function.

Note - kmem_cache_alloc() and kmem_cache_free() are not public DDI interfaces. Do NOT write code that relies on them, because they are subject to change or removal in future releases of Solaris.

Caches whose name begins with "kmem_alloc_" implement the kernel's general memory allocation scheme. These caches provide memory to clients of kmem_alloc(9F) and kmem_zalloc(9F). Each of these caches satisfies requests whose size is between the buffer size of that cache and the buffer size of the next smallest cache. For example, the kernel has kmem_alloc_8 and kmem_alloc_16 caches. In this case, the kmem_alloc_16 cache handles all client requests for 9-16 bytes of memory. Remember that the size of each buffer in the kmem_alloc_16 cache is 16 bytes, regardless of the size of the client request. In a 14 byte request, two bytes of the resulting buffer are unused, since the request is satisfied from the kmem_alloc_16 cache.

The last set of caches are those used internally by the kernel memory allocator for its own bookkeeping. These include those caches whose names start with "kmem_magazine_" or "kmem_va_", the kmem_slab_cache, the kmem_bufctl_cache and others.

Kernel Memory Caches

This section explains how to find and examine kernel memory caches. You can learn about the various kmem caches on the system by issuing the ::kmastat command.

> ::kmastat
cache                        buf    buf    buf    memory     alloc alloc
name                        size in use  total    in use   succeed  fail
------------------------- ------ ------ ------ --------- --------- -----
kmem_magazine_1                8     24   1020      8192        24     0
kmem_magazine_3               16    141    510      8192       141     0
kmem_magazine_7               32     96    255      8192        96     0
...
kmem_alloc_8                   8   3614   3751     90112   9834113     0
kmem_alloc_16                 16   2781   3072     98304   8278603     0
kmem_alloc_24                 24    517    612     24576    680537     0
kmem_alloc_32                 32    398    510     24576    903214     0
kmem_alloc_40                 40    482    584     32768    672089     0
...
thread_cache                 368    107    126     49152    669881     0
lwp_cache                    576    107    117     73728       182     0
turnstile_cache               36    149    292     16384    670506     0
cred_cache                    96      6     73      8192   2677787     0
...

If you run ::kmastat you get a feel for what a "normal" system looks like. This will help you to spot excessively large caches on systems that are leaking memory. The results of ::kmastat will vary depending on the system you are running on, how many processes are running, and so forth.

Another way to list the various kmem caches is with the ::kmem_cache command:

> ::kmem_cache
ADDR     NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL
70036028 kmem_magazine_1           0020 0e0000        8     1020
700362a8 kmem_magazine_3           0020 0e0000       16      510
70036528 kmem_magazine_7           0020 0e0000       32      255
...
70039428 kmem_alloc_8              020f 000000        8     3751
700396a8 kmem_alloc_16             020f 000000       16     3072
70039928 kmem_alloc_24             020f 000000       24      612
70039ba8 kmem_alloc_32             020f 000000       32      510
7003a028 kmem_alloc_40             020f 000000       40      584
...

This command is useful because it maps cache names to addresses, and provides the debugging flags for each cache in the FLAG column. It is important to understand that the allocator's selection of debugging features is derived on a per-cache basis from this set of flags. These are set in conjunction with the global kmem_flags variable at cache creation time. Setting kmem_flags while the system is running has no effect on the debugging behavior, except for subsequently created caches (which is rare after boot-up).

Next, walk the list of kmem caches directly using MDB's kmem_cache walker:

> ::walk kmem_cache
70036028
700362a8
70036528
700367a8
...

Debugging With the Kernel Memory Allocator

Getting Started: Creating a Sample Crash Dump

Setting kmem_flags

Forcing a Crash Dump

Starting MDB

Allocator Basics

Buffer States

Transactions

Sleeping and Non-Sleeping Allocations

Kernel Memory Caches

Kernel Memory Caches

Setting `kmem_flags`