Sun Microsystems, Inc.
spacerspacer
spacer www.sun.com docs.sun.com |
spacer
black dot
 
 
  Previous   Contents   Next 
   
 
Chapter 9

Programming Guidelines

This chapter gives some pointers on programming with threads. Most pointers apply to both Solaris and POSIX threads, but where functionality differs, it is noted. Changing from single-threaded thinking to multithreaded thinking is emphasized in this chapter.

Rethinking Global Variables

Historically, most code has been designed for single-threaded programs. This is especially true for most of the library routines called from C programs. The following implicit assumptions were made for single-threaded code:

  • When you write into a global variable and then, a moment later, read from it, what you read is exactly what you just wrote.

  • This is also true for nonglobal, static storage.

  • You do not need synchronization because there is nothing to synchronize with.

The next few examples discuss some of the problems that arise in multithreaded programs because of these assumptions, and how you can deal with them.

Traditional, single-threaded C and UNIX have a convention for handling errors detected in system calls. System calls can return anything as a functional value (for example, write() returns the number of bytes that were transferred). However, the value -1 is reserved to indicate that something went wrong. So, when a system call returns -1, you know that it failed.


Example 9-1 Global Variables and errno

extern int errno;
...
if (write(file_desc, buffer, size) == -1) {
    /* the system call failed */
    fprintf(stderr, "something went wrong, "
        "error code = %d\n", errno);
    exit(1);
}
...

Rather than return the actual error code (which could be confused with normal return values), the error code is placed into the global variable errno. When the system call fails, you can look in errno to find out what went wrong.

Now consider what happens in a multithreaded environment when two threads fail at about the same time, but with different errors. Both expect to find their error codes in errno, but one copy of errno cannot hold both values. This global variable approach simply does not work for multithreaded programs.

Threads solves this problem through a conceptually new storage class--thread-specific data. This storage is similar to global storage in that it can be accessed from any procedure in which a thread might be running. However, it is private to the thread--when two threads refer to the thread-specific data location of the same name, they are referring to two different areas of storage.

So, when using threads, each reference to errno is thread specific because each thread has a private copy of errno. This is achieved in this implementation by making errno a macro that expands to a function call.

Providing for Static Local Variables

Example 9-2 shows a problem similar to the errno problem, but involving static storage instead of global storage. The function gethostbyname(3NSL) is called with the computer name as its argument. The return value is a pointer to a structure containing the required information for contacting the computer through network communications.


Example 9-2 The gethostbyname() Problem

struct hostent *gethostbyname(char *name) {
    static struct hostent result;
        /* Lookup name in hosts database */
        /* Put answer in result */
    return(&result);
}

Returning a pointer to a local variable is generally not a good idea, although it works in this case because the variable is static. However, when two threads call this variable at once with different computer names, the use of static storage conflicts.

Thread-specific data could be used as a replacement for static storage, as in the errno problem, but this involves dynamic allocation of storage and adds to the expense of the call.

A better way to handle this kind of problem is to make the caller of gethostbyname() supply the storage for the result of the call. This is done by having the caller supply an additional argument, an output argument, to the routine. This requires a new interface to gethostbyname().

This technique is used in threads to fix many of these problems. In most cases, the name of the new interface is the old name with "_r" appended, as in gethostbyname_r(3NSL).

Synchronizing Threads

The threads in an application must cooperate and synchronize when sharing the data and the resources of the process.

A problem arises when multiple threads call something that manipulates an object. In a single-threaded world, synchronizing access to such objects is not a problem, but as Example 9-3 illustrates, this is a concern with multithreaded code. (Note that the printf(3S) function is safe to call for a multithreaded program; this example illustrates what could happen if printf() were not safe.)


Example 9-3 The printf() Problem

/* thread 1: */
    printf("go to statement reached");


/* thread 2: */
    printf("hello world");



printed on display:
    go to hello

Single-Threaded Strategy

One strategy is to have a single, application-wide mutex lock that is acquired whenever any thread in the application is running and is released before it must block. Since only one thread can be accessing shared data at any one time, each thread has a consistent view of memory.

Because this is effectively a single-threaded program, very little is gained by this strategy.

Reentrance

A better approach is to take advantage of the principles of modularity and data encapsulation. A reentrant function is one that behaves correctly if it is called simultaneously by several threads. Writing a reentrant function is a matter of understanding just what behaves correctly means for this particular function.

Functions that are callable by several threads must be made reentrant. This might require changes to the function interface or to the implementation.

Functions that access global state, like memory or files, have reentrance problems. These functions need to protect their use of global state with the appropriate synchronization mechanisms provided by threads.

The two basic strategies for making functions in modules reentrant are code locking and data locking.

 
 
 
  Previous   Contents   Next