The creation of initialization and termination sections can be carried out directly using an assembler, or some compilers can offer special primitives to simplify their declaration. For example, the same example containing the following #pragma definitions can result in a call to the function foo being placed in an .init section, and a call to the function bar being placed in a .fini section.
$ cat main.c #include <stdio.h> #pragma init (foo) #pragma fini (bar) ....... $ cc -o main main.c $ main initializing: foo() main() finalizing: bar() |
Be careful when designing initialization and termination code that can be included in both a shared object and archive library. If this code is spread throughout several relocatable objects within an archive library, then the link-edit of an application using this archive might extract only a portion of the objects. Therefore, the link-edit might extract only a portion of the initialization and termination code. At runtime, only this portion of code is executed. The same application built against the shared object will have all the accumulated initialization and termination code executed at runtime when the shared object is loaded as one of the application's dependencies.
Determining the sequence of executing initialization and termination code within a process at runtime is a complex issue involving dependency analysis. Initialization and termination code that references external global symbols make this process more difficult and can result in cyclic dependencies. The most flexible initialization and termination code references elements only within the resident object.
Data initialization should be independent if the initialization code is involved with a dynamic object whose memory can be dumped using dldump(3DL).
Symbol Processing
During input file processing, all local symbols from the input relocatable objects are passed through to the output file image. All global symbols are accumulated internally within the link-editor. This internal symbol table is searched for each new global symbol entry processed to determine if a symbol with the same name has already been encountered from a previous input file. If so, a symbol resolution process is called to determine which of the two entries is to be kept.
On completion of input file processing, and providing no fatal error conditions have been encountered during symbol resolution, the link-editor determines if any unbound symbol references (undefined symbols) remain that will cause the link-edit to fail.
Finally, the link-editor's internal symbol table is added to the symbol tables of the image being created.
The following sections expand upon symbol resolution and undefined symbol processing.
Symbol Resolution
Symbol resolution runs the entire spectrum, from simple and intuitive to complex and perplexing. Resolutions can be carried out silently by the link-editor, can be accompanied by warning diagnostics, or can result in a fatal error condition.
The resolution of two symbols depends on their attributes, the type of file providing the symbol, and the type of file being generated. For a complete description of symbol attributes, see "Symbol Table". For the following discussions, however, it is worth identifying three basic symbol types:
Undefined - Symbols that have been referenced in a file but have not been assigned a storage address.
Tentative - Symbols that have been created within a file but have not yet been sized or allocated in storage. They appear as uninitialized C symbols, or FORTRAN COMMON blocks within the file.
Defined - Symbols that have been created and assigned storage addresses and space within the file.
In its simplest form, symbol resolution involves the use of a precedence relationship that has defined symbols dominating tentative symbols, which in turn dominate undefined symbols.
The following C code example shows how these symbol types can be generated. Undefined symbols are prefixed with u_, tentative symbols are prefixed with t_, and defined symbols are prefixed with d_.
$ cat main.c extern int u_bar; extern int u_foo(); int t_bar; int d_bar = 1; d_foo() { return (u_foo(u_bar, t_bar, d_bar)); } $ cc -o main.o -c main.c $ nm -x main.o [Index] Value Size Type Bind Other Shndx Name ............... [8] |0x00000000|0x00000000|NOTY |GLOB |0x0 |UNDEF |u_foo [9] |0x00000000|0x00000040|FUNC |GLOB |0x0 |2 |d_foo [10] |0x00000004|0x00000004|OBJT |GLOB |0x0 |COMMON |t_bar [11] |0x00000000|0x00000000|NOTY |GLOB |0x0 |UNDEF |u_bar [12] |0x00000000|0x00000004|OBJT |GLOB |0x0 |3 |d_bar |
Simple Resolutions
Simple symbol resolutions are by far the most common, and result when two symbols with similar characteristics are detected and one symbol takes precedence over the other. This symbol resolution is carried out silently by the link-editor. For example, for symbols with the same binding, a reference to an undefined symbol from one file is bound to, or satisfied by, a defined or tentative symbol definition from another file. Or, a tentative symbol definition from one file is bound to a defined symbol definition from another file.
Symbols that undergo resolution can have either a global or weak binding. Weak bindings have lower precedence than global binding, so symbols with different bindings are resolved according to a slight alteration of the basic rules.
Weak symbols can usually be defined via the compiler, either individually or as aliases to global symbols. One mechanism uses a #pragma definition:
$ cat main.c #pragma weak bar #pragma weak foo = _foo int bar = 1; _foo() { return (bar); } $ cc -o main.o -c main.c $ nm -x main.o [Index] Value Size Type Bind Other Shndx Name ............... [7] |0x00000000|0x00000004|OBJT |WEAK |0x0 |3 |bar [8] |0x00000000|0x00000028|FUNC |WEAK |0x0 |2 |foo [9] |0x00000000|0x00000028|FUNC |GLOB |0x0 |2 |_foo |
Notice that the weak alias foo is assigned the same attributes as the global symbol _foo. This relationship is maintained by the link-editor and results in the symbols being assigned the same value in the output image. In symbol resolution, weak defined symbols are silently overridden by any global definition of the same name.
Another form of simple symbol resolution, interposition, occurs between relocatable objects and shared objects, or between multiple shared objects. In these cases, when a symbol is multiply-defined, the relocatable object, or the first definition between multiple shared objects, is silently taken by the link-editor. The relocatable object's definition, or the first shared object's definition, is said to interpose on all other definitions. This interposition can be used to override the functionality provided by one shared object, by a dynamic executable, or by another shared object.
The combination of weak symbols and interposition provides a useful programming technique. For example, the standard C library provides several services that you are allowed to redefine. However, ANSI C defines a set of standard services that must be present on the system and cannot be replaced in a strictly conforming program.
The function fread(3C), for example, is an ANSI C library function, whereas the system function read(2) is not. A conforming ANSI C program must be able to redefine read(2) and still use fread(3C) in a predictable way.
The problem here is that read(2) underlies the fread(3C) implementation in the standard C library. Therefore a program that redefines read(2) might confuse the fread(3C) implementation. To guard against this occurrence, ANSI C states that an implementation cannot use a name that is not reserved for it. Using the following #pragma directive you can define just such a reserved name, and from it generate an alias for the function read(2).
#pragma weak read = _read |
Thus, you can quite freely define your own read() function without compromising the fread(3C) implementation, which in turn is implemented to use the _read() function.
The link-editor will not have difficulty with your redefinition of read(), either when linking against the shared object or archive version of the standard C library. In the former case, interposition takes its course. In the latter case, the fact that the C library's definition of read(2) is weak allows that definition to be quietly overridden.
You can use the link-editor's -m option to write a list of all interposed symbol references, along with section load address information, to the standard output.