User-Level Threads

Threads are the primary programming interface in multithreaded programming. [User-level threads are so named to distinguish them from kernel-level threads, which are the concern of systems programmers only. Because this book is for application programmers, kernel-level threads are not discussed.] Threads are visible only from within the process, where they share all process resources like address space, open files, and so on. The following state is unique to each thread.

Thread ID
Register state (including PC and stack pointer)
Stack
Signal mask
Priority
Thread-private storage

Because threads share the process instructions and most of the process data, a change in shared data by one thread can be seen by the other threads in the process. When a thread needs to interact with other threads in the same process, it can do so without involving the operating environment.

By default, threads are lightweight. But, to get more control over a thread (for instance, to control scheduling policy more), the application can bind the thread. When an application binds threads to execution resources, the threads become kernel resources (see "System Scope (Bound Threads)" for more information).

To summarize, user-level threads are:

Inexpensive to create because they do not need to create their own address space.

Fast to synchronize because synchronization is done at the application level, not at the kernel level.

Managed by the threads library; either libpthread or libthread.

Lightweight Processes

The threads library uses underlying threads of control called lightweight processes that are supported by the kernel. You can think of an LWP as a virtual CPU that executes code or system calls.

You usually do not need to concern yourself with LWPs to program with threads. The information here about LWPs is provided as background, so you can understand the differences in scheduling scope, described on "Process Scope (Unbound Threads)".

Much as the stdio library routines such as fopen() and fread() use the open() and read() functions, the threads interface uses the LWP interface, and for many of the same reasons.

Lightweight processes (LWPs) bridge the user level and the kernel level. Each process contains one or more LWP, each of which runs one or more user threads. (See Figure 1-1.)

Figure 1-1 User-level Threads and Lightweight Processes

Each LWP is a kernel resource in a kernel pool, and is allocated and de-allocated to a thread on a per thread basis.

Scheduling

POSIX specifies three scheduling policies: first-in-first-out (SCHED_FIFO), round-robin (SCHED_RR), and custom (SCHED_OTHER). SCHED_FIFO is a queue-based scheduler with different queues for each priority level. SCHED_RR is like FIFO except that each thread has an execution time quota.

Both SCHED_FIFO and SCHED_RR are POSIX Realtime extensions. SCHED_OTHER is the default scheduling policy.

See "LWPs and Scheduling Classes" for information about the SCHED_OTHER policy.

Two scheduling scopes are available: process scope for unbound threads and system scope for bound threads. Threads with differing scope states can coexist on the same system and even in the same process. In general, the scope sets the range in which the threads scheduling policy is in effect.

Process Scope (Unbound Threads)

PTHREAD_SCOPE_PROCESS threads are created as unbound threads. The association of these threads with LWPs is managed by the threads library.

In most cases, threads should be PTHREAD_SCOPE_PROCESS. These threads have no restriction to execute on a particular LWP, and are equivalent to Solaris thread created without the THR_BOUND flag. The threads library decides the association between individual threads and LWPs.

System Scope (Bound Threads)

PTHREAD_SCOPE_SYSTEM threads are created as bound threads. A bound thread is permanently attached to an LWP.

Each bound thread is bound to an LWP for the lifetime of the thread. This is equivalent to creating a Solaris thread in the THR_BOUND state. You can bind a thread to use special scheduling attributes with Realtime scheduling.

Note - In neither case, bound or unbound, can a thread be directly accessed by or moved to another process.

Cancellation

Thread cancellation allows a thread to terminate the execution of any other thread in the process. The target thread (the one being cancelled) can keep cancellation requests pending and can perform application-specific cleanup when it acts upon the cancellation notice.

The pthreads cancellation feature permits either asynchronous or deferred termination of a thread. Asynchronous cancellation can occur at any time; deferred cancellation can occur only at defined points. Deferred cancellation is the default type.

Synchronization

Synchronization allows you to control program flow and access to shared data for concurrently executing threads.

The four synchronization models are mutex locks, read/write locks, condition variables, and semaphores.

Mutex locks allow only one thread at a time to execute a specific section of code, or to access specific data.
Read/write locks permit concurrent reads and exclusive writes to a protected shared resource. To modify a resource, a thread must first acquire the exclusive write lock. An exclusive write lock is not permitted until all read locks have been released.
Condition variables block threads until a particular condition is true.
Counting semaphores typically coordinate access to resources. The count is the limit on how many threads can have access to a semaphore. When the count is reached, the semaphore blocks.

Using the 64-bit Architecture

For application developers, the major difference between the Solaris 64-bit and 32-bit operating environments is the C-language data type model used. The 64-bit data type uses the LP64 model where longs and pointers are 64-bits wide. All other fundamental data types remain the same as those of the 32-bit implementation. The 32-bit data type uses the ILP32 model where ints, longs, and pointers are 32-bits.

The following summary briefly describes the major features and considerations for using the 64-bit environment:

Large Virtual Address Space

In the 64-bit environment, a process can have up to 64 bits of virtual address space, or 18 exabytes. This is 4 billion times the current 4 Gbyte maximum of a 32-bit process. Because of hardware restrictions, however, some platforms might not support the full 64 bits of address space.

Large address space increases the number of threads that can be created with the default stack size (1 megabyte on 32 bits, 2 megabytes on 64 bits). The number of threads with the default stack size is approximately 2000 threads on a 32-bit system and 8000 billion on a 64-bit system.
Kernel Memory Readers

Because the kernel is an LP64 object that uses 64-bit data structures internally, existing 32-bit applications that use libkvm, /dev/mem, or /dev/kmem do not work properly and must be converted to 64-bit programs.
/proc Restrictions

A 32-bit program that uses /proc is able to look at 32-bit processes, but is unable to understand a 64-bit process; the existing interfaces and data structures that describe the process are not large enough to contain the 64-bit quantities involved. Such programs must be recompiled as 64-bit programs to work for both 32-bit and 64-bit processes.
64-bit Libraries

32-bit applications are required to link with 32-bit libraries, and 64-bit applications are required to link with 64-bit libraries. With the exception of those libraries that have become obsolete, all of the system libraries are provided in both 32-bit and 64-bit versions. However, no 64-libraries are provided in static form.
64-bit Arithmetic

Though 64-bit arithmetic has long been available in previous 32-bit Solaris releases, the 64-bit implementation now provides full 64-bit machine registers for integer operations and parameter passing.
Large Files

If an application requires only large file support, then it can remain 32-bit and use the Large Files interface. It is, however, recommended that the application be converted to 64-bit to take full advantage of 64-bit capabilities.


1. Covering Multithreading Basics Understanding Basic Multithreading Concepts Looking at Multithreading Structure


	User-Level Threads Threads are the primary programming interface in multithreaded programming. [User-level threads are so named to distinguish them from kernel-level threads, which are the concern of systems programmers only. Because this book is for application programmers, kernel-level threads are not discussed.] Threads are visible only from within the process, where they share all process resources like address space, open files, and so on. The following state is unique to each thread. Thread ID Register state (including PC and stack pointer) Stack Signal mask Priority Thread-private storage Because threads share the process instructions and most of the process data, a change in shared data by one thread can be seen by the other threads in the process. When a thread needs to interact with other threads in the same process, it can do so without involving the operating environment. By default, threads are lightweight. But, to get more control over a thread (for instance, to control scheduling policy more), the application can bind the thread. When an application binds threads to execution resources, the threads become kernel resources (see "System Scope (Bound Threads)" for more information). To summarize, user-level threads are: Inexpensive to create because they do not need to create their own address space. Fast to synchronize because synchronization is done at the application level, not at the kernel level. Managed by the threads library; either `libpthread` or `libthread`. Lightweight Processes The threads library uses underlying threads of control called lightweight processes that are supported by the kernel. You can think of an LWP as a virtual CPU that executes code or system calls. You usually do not need to concern yourself with LWPs to program with threads. The information here about LWPs is provided as background, so you can understand the differences in scheduling scope, described on "Process Scope (Unbound Threads)". Much as the `stdio` library routines such as `fopen()` and `fread()` use the `open()` and `read()` functions, the threads interface uses the LWP interface, and for many of the same reasons. Lightweight processes (LWPs) bridge the user level and the kernel level. Each process contains one or more LWP, each of which runs one or more user threads. (See Figure 1-1.) Figure 1-1 User-level Threads and Lightweight Processes Each LWP is a kernel resource in a kernel pool, and is allocated and de-allocated to a thread on a per thread basis. Scheduling POSIX specifies three scheduling policies: first-in-first-out (`SCHED_FIFO`), round-robin (`SCHED_RR`), and custom (`SCHED_OTHER`). `SCHED_FIFO` is a queue-based scheduler with different queues for each priority level. `SCHED_RR` is like FIFO except that each thread has an execution time quota. Both `SCHED_FIFO` and `SCHED_RR` are POSIX Realtime extensions. `SCHED_OTHER` is the default scheduling policy. See "LWPs and Scheduling Classes" for information about the `SCHED_OTHER` policy. Two scheduling scopes are available: process scope for unbound threads and system scope for bound threads. Threads with differing scope states can coexist on the same system and even in the same process. In general, the scope sets the range in which the threads scheduling policy is in effect. Process Scope (Unbound Threads) `PTHREAD_SCOPE_PROCESS` threads are created as unbound threads. The association of these threads with LWPs is managed by the threads library. In most cases, threads should be `PTHREAD_SCOPE_PROCESS`. These threads have no restriction to execute on a particular LWP, and are equivalent to Solaris thread created without the `THR_BOUND` flag. The threads library decides the association between individual threads and LWPs. System Scope (Bound Threads) `PTHREAD_SCOPE_SYSTEM` threads are created as bound threads. A bound thread is permanently attached to an LWP. Each bound thread is bound to an LWP for the lifetime of the thread. This is equivalent to creating a Solaris thread in the `THR_BOUND` state. You can bind a thread to use special scheduling attributes with Realtime scheduling. Note - In neither case, bound or unbound, can a thread be directly accessed by or moved to another process. Cancellation Thread cancellation allows a thread to terminate the execution of any other thread in the process. The target thread (the one being cancelled) can keep cancellation requests pending and can perform application-specific cleanup when it acts upon the cancellation notice. The pthreads cancellation feature permits either asynchronous or deferred termination of a thread. Asynchronous cancellation can occur at any time; deferred cancellation can occur only at defined points. Deferred cancellation is the default type. Synchronization Synchronization allows you to control program flow and access to shared data for concurrently executing threads. The four synchronization models are mutex locks, read/write locks, condition variables, and semaphores. Mutex locks allow only one thread at a time to execute a specific section of code, or to access specific data. Read/write locks permit concurrent reads and exclusive writes to a protected shared resource. To modify a resource, a thread must first acquire the exclusive write lock. An exclusive write lock is not permitted until all read locks have been released. Condition variables block threads until a particular condition is true. Counting semaphores typically coordinate access to resources. The count is the limit on how many threads can have access to a semaphore. When the count is reached, the semaphore blocks. Using the 64-bit Architecture For application developers, the major difference between the Solaris 64-bit and 32-bit operating environments is the C-language data type model used. The 64-bit data type uses the LP64 model where `longs` and pointers are 64-bits wide. All other fundamental data types remain the same as those of the 32-bit implementation. The 32-bit data type uses the ILP32 model where `ints`, `longs`, and pointers are 32-bits. The following summary briefly describes the major features and considerations for using the 64-bit environment: Large Virtual Address Space In the 64-bit environment, a process can have up to 64 bits of virtual address space, or 18 exabytes. This is 4 billion times the current 4 Gbyte maximum of a 32-bit process. Because of hardware restrictions, however, some platforms might not support the full 64 bits of address space. Large address space increases the number of threads that can be created with the default stack size (1 megabyte on 32 bits, 2 megabytes on 64 bits). The number of threads with the default stack size is approximately 2000 threads on a 32-bit system and 8000 billion on a 64-bit system. Kernel Memory Readers Because the kernel is an LP64 object that uses 64-bit data structures internally, existing 32-bit applications that use `libkvm`, `/dev/mem`, or `/dev/kmem` do not work properly and must be converted to 64-bit programs. `/proc` Restrictions A 32-bit program that uses `/proc` is able to look at 32-bit processes, but is unable to understand a 64-bit process; the existing interfaces and data structures that describe the process are not large enough to contain the 64-bit quantities involved. Such programs must be recompiled as 64-bit programs to work for both 32-bit and 64-bit processes. 64-bit Libraries 32-bit applications are required to link with 32-bit libraries, and 64-bit applications are required to link with 64-bit libraries. With the exception of those libraries that have become obsolete, all of the system libraries are provided in both 32-bit and 64-bit versions. However, no 64-libraries are provided in static form. 64-bit Arithmetic Though 64-bit arithmetic has long been available in previous 32-bit Solaris releases, the 64-bit implementation now provides full 64-bit machine registers for integer operations and parameter passing. Large Files If an application requires only large file support, then it can remain 32-bit and use the Large Files interface. It is, however, recommended that the application be converted to 64-bit to take full advantage of 64-bit capabilities.