Redzone: `0xfeedface`

The pattern 0xfeedface appears frequently in the buffer above. This pattern is known as the "redzone" indicator. It enables the allocator (and a programmer debugging a problem) to determine if the boundaries of a buffer have been violated by "buggy" code. Following the redzone is some additional information. The contents of that data depends upon other factors (see "Memory Allocation Logging"). The redzone and its suffix are collectively called the buftag region. Figure 8-1 summarizes this information.

Figure 8-1 The Redzone

The buftag is appended to each buffer in a cache when any of the KMF_AUDIT, KMF_DEADBEEF, or KMF_REDZONE flags are set in that buffer's cache. The contents of the buftag depend on whether KMF_AUDIT is set.

Decomposing the memory region presented above into distinct buffers is now simple:

0x70a9add8:     deadbeef        deadbeef  \
0x70a9ade0:     deadbeef        deadbeef   +- User Data (free)
0x70a9ade8:     deadbeef        deadbeef  /
0x70a9adf0:     feedface        feedface  -- REDZONE
0x70a9adf8:     70ae3260        8440c68e  -- Debugging Data

0x70a9ae00:     5               4ef83     \
0x70a9ae08:     0               0          +- User Data (allocated)
0x70a9ae10:     1               bbddcafe  /
0x70a9ae18:     feedface        139d    -- REDZONE
0x70a9ae20:     70ae3200        d1befaed  -- Debugging Data

0x70a9ae28:     deadbeef        deadbeef  \
0x70a9ae30:     deadbeef        deadbeef   +- User Data (free)
0x70a9ae38:     deadbeef        deadbeef  /
0x70a9ae40:     feedface        feedface  -- REDZONE
0x70a9ae48:     70ae31a0        8440c54e  -- Debugging Data

In the free buffers at 0x70a9add8 and 0x70a9ae28, the redzone is filled with 0xfeedfacefeedface. This a convenient way of determining that a buffer is free.

In the allocated buffer beginning at 0x70a9ae00, the situation is different. Recall from "Allocator Basics" that there are two allocation types:

1) The client requested memory using kmem_cache_alloc(), in which case the size of the requested buffer is equal to the bufsize of the cache.

2) The client requested memory using kmem_alloc(9F), in which case the size of the requested buffer is less than or equal to the bufsize of the cache. For example, a request for 20 bytes will be fulfilled from the kmem_alloc_24 cache. The allocator enforces the buffer boundary by placing a marker, the redzone byte, immediately following the client data:

0x70a9ae00:     5               4ef83     \
0x70a9ae08:     0               0          +- User Data (allocated)
0x70a9ae10:     1               bbddcafe  /
0x70a9ae18:     feedface        139d    -- REDZONE
0x70a9ae20:     70ae3200        d1befaed  -- Debugging Data

0xfeedface at 0x70a9ae18 is followed by a 32-bit word containing what seems to be a random value. This number is actually an encoded representation of the size of the buffer. To decode this number and find the size of the allocated buffer, use the formula:

size = redzone_value / 251

So, in this example,

size = 0x139d / 251 = 20 bytes.

This indicates that the buffer requested was of size 20 bytes. The allocator performs this decoding operation and finds that the redzone byte should be at offset 20. The redzone byte is the hex pattern 0xbb, which is present at 0x729084e4 (0x729084d0 + 0t20) as expected.

Figure 8-2 Sample kmem_alloc(9F) Buffer

Figure 8-3 shows the general form of this memory layout.

Figure 8-3 Redzone Byte

If the allocation size is the same as the bufsize of the cache, the redzone byte overwrites the first byte of the redzone itself, as shown in Figure 8-4.

Figure 8-4 Redzone Byte at the Beginning of the Redzone

This overwriting results in the first 32-bit word of the redzone being 0xbbedface, or 0xfeedfabb depending on the endianness of the hardware on which the system is running.

Note - Why is the allocation size encoded this way? To encode the size, the allocator uses the formula (251 * size + 1). When the size decode occurs, the integer division discards the remainder of '+1'. However, the addition of 1 is valuable because the allocator can check whether the size is valid by testing whether (size % 251 == 1). In this way, the allocator defends against corruption of the redzone byte index.

Uninitialized Data: `0xbaddcafe`

You might be wondering what the suspicious 0xbbddcafe at address 0x729084d4 was before the redzone byte got placed over the first byte in the word. It was 0xbaddcafe. When the KMF_DEADBEEF flag is set in the cache, allocated but uninitialized memory is filled with the 0xbaddcafe pattern. When the allocator performs an allocation, it loops across the words of the buffer and verifies that each word contains 0xdeadbeef, then fills that word with 0xbaddcafe.

A system can panic with a message such as:

panic[cpu1]/thread=e1979420: BAD TRAP: type=e (Page Fault)
rp=ef641e88 addr=baddcafe occurred in module "unix" due to an
illegal access to a user address

In this case, the address that caused the fault was 0xbaddcafe: the panicking thread has accessed some data that was never initialized.

Associating Panic Messages With Failures

The kernel memory allocator emits panic messages corresponding to the failure modes described earlier. For example, a system can panic with a message such as:

kernel memory allocator: buffer modified after being freed
modification occurred at offset 0x30

The allocator was able to detect this case because it tried to validate that the buffer in question was filled with 0xdeadbeef. At offset 0x30, this condition was not met. Since this condition indicates memory corruption, the allocator panicked the system.

Another example failure message is:

kernel memory allocator: redzone violation: write past end of buffer

The allocator was able to detect this case because it tried to validate that the redzone byte (0xbb) was in the location it determined from the redzone size encoding. It failed to find the signature byte in the correct location. Since this indicates memory corruption, the allocator panicked the system. Other allocator panic messages are discussed later.

Memory Allocation Logging

This section explains the logging features of the kernel memory allocator and how you can employ them to debug system crashes.

Buftag Data Integrity

As explained earlier, the second half of each buftag contains extra information about the corresponding buffer. Some of this data is debugging information, and some is data private to the allocator. While this auxiliary data can take several different forms, it is collectively known as "Buffer Control" or bufctl data.

However, the allocator needs to know whether a buffer's bufctl pointer is valid, since this pointer might also have been corrupted by malfunctioning code. The allocator confirms the integrity of its auxiliary pointer by storing the pointer and an encoded version of that pointer, and then cross-checking the two versions.

As shown in Figure 8-5, these pointers are the bcp (buffer control pointer) and bxstat (buffer control XOR status). The allocator arranges bcp and bxstat so that the expression bcp XOR bxstat equals a well-known value.

Figure 8-5 Extra Debugging Data in the Buftag

In the event that one or both of these pointers becomes corrupted, the allocator can easily detect such corruption and panic the system. When a buffer is allocated, bcp XOR bxstat = 0xa110c8ed ("allocated"). When a buffer is free, bcp XOR bxstat = 0xf4eef4ee ("freefree").

Note - You might find it helpful to re-examine the example provided in "Freed Buffer Checking: 0xdeadbeef", in order to confirm that the buftag pointers shown there are consistent.

In the event that the allocator finds a corrupt buftag, it panics the system and produces a message similar to the following:

kernel memory allocator: boundary tag corrupted
    bcp ^ bxstat = 0xffeef4ee, should be f4eef4ee

Remember, if bcp is corrupt, it is still possible to retrieve its value by taking the value of bxstat XOR 0xf4eef4ee or bxstat XOR 0xa110c8ed, depending on whether the buffer is allocated or free.

The bufctl Pointer

The buffer control (bufctl) pointer contained in the buftag region can have different meanings, depending on the cache's kmem_flags. The behavior toggled by the KMF_AUDIT flag is of particular interest: when the KMF_AUDIT flag is not set, the kernel memory allocator allocates a kmem_bufctl_t structure for each buffer. This structure contains some minimal accounting information about each buffer. When the KMF_AUDIT flag is set, the allocator instead allocates a kmem_bufctl_audit_t, an extended version of the kmem_bufctl_t.

This section presumes the KMF_AUDIT flag is set. For caches that do not have this bit set, the amount of available debugging information is reduced.

The kmem_bufctl_audit_t (bufctl_audit for short) contains additional information about the last transaction that occurred on this buffer. The following example shows how to apply the bufctl_audit macro to examine an audit record. The buffer shown is the example buffer used in "Detecting Memory Corruption":

> 0x70a9ae00,5/KKn
0x70a9ae00:     5               4ef83
                0               0
                1               bbddcafe
                feedface        139d
                70ae3200        d1befaed

Using the techniques presented above, it is easy to see that 0x70ae3200 points to the bufctl_audit record: it is the first pointer following the redzone. To examine the bufctl_audit record it points to, apply the bufctl_audit macro:

> 0x70ae3200$<bufctl_audit
0x70ae3200:     next            addr            slab
                70378000        70a9ae00        707c86a0
0x70ae320c:     cache           timestamp       thread
                70039928        e1bd0e26afe     70aac4e0
0x70ae321c:     lastlog         contents        stackdepth
                7011c7c0        7018a0b0        4
0x70ae3228:
                kmem_zalloc+0x30
                pid_assign+8
                getproc+0x68
                cfork+0x60

The 'addr' field is the address of the buffer corresponding to this bufctl_audit record. This is the original address: 0x70a9ae00. The 'cache' field points at the kmem_cache that allocated this buffer. You can use the ::kmem_cache dcmd to examine it as follows:

> 0x70039928::kmem_cache
ADDR     NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL
70039928 kmem_alloc_24             020f 000000       24      612

The 'timestamp' field represents the time this transaction occurred. This time is expressed in the same manner as gethrtime(3C).

'thread' is a pointer to the thread that performed the last transaction on this buffer. The 'lastlog' and 'contents' pointers point to locations in the allocator's transaction logs. These logs are discussed in detail in "Allocator Logging Facility".

Typically, the most useful piece of information provided by bufctl_audit is the stack trace recorded at the point at which the transaction took place. In this case, the transaction was an allocation called as part of executing fork(2).


8. Debugging With the Kernel Memory Allocator Detecting Memory Corruption Freed Buffer Checking: `0xdeadbeef`