When the dynamic executable prog is executed, the function foo() is obtained from the filtee libbar.so.1, not from the filter libfoo.so.1. However, the data item bar is obtained from the filter libfoo.so.1, as this symbol has no alternative definition in the filtee libbar.so.1.
Auxiliary filters provide a mechanism for defining an alternative interface of an existing shared object. This mechanism is used in the Solaris operating environment to provide optimized functionality within platform specific shared objects. See "Instruction Set Specific Shared Objects" and "System Specific Shared Objects" for examples.
Filtee Processing
The runtime linker's processing of a filter defers the loading of a filtee until a reference to a symbol within the filter has occurred. This implementation is analogous to the filter performing a dlopen(3DL) on each of its filtees as they are required. This implementation accounts for differences in dependency reporting that can be produced by tools such as ldd(1).
The link-editor's -z loadfltr option can be used when creating a filter to cause the immediate processing of its filtees at runtime. In addition, the immediate processing of any filtees within a process can be triggered by setting the LD_LOADFLTR environment variable to any value.
Performance Considerations
A shared object can be used by multiple applications within the same system. The performance of a shared object affects the applications that use it and the system as a whole.
Although the actual code within a shared object will directly affect the performance of a running process, the performance issues focused upon here target the runtime processing of the shared object itself. The following sections investigate this processing in more detail by looking at aspects such as text size and purity, together with relocation overhead.
Analyzing Files
Various tools are available to analyze the contents of an ELF file. To display the size of a file use the size(1) command. For example:
$ size -x libfoo.so.1 59c + 10c + 20 = 0x6c8 $ size -xf libfoo.so.1 ..... + 1c(.init) + ac(.text) + c(.fini) + 4(.rodata) + \ ..... + 18(.data) + 20(.bss) ..... |
The first example indicates the size of the shared objects text, data, and bss, a categorization used in previous releases of the SunOS operating system.
The ELF format provides a finer granularity for expressing data within a file by organizing the data into sections. The second example displays the size of each of the file's loadable sections.
Sections are allocated to units known as segments, some of which describe how portions of a file will be mapped into memory (see the mmap(2) man page). These loadable segments can be displayed by using the dump(1) command and examining the LOAD entries. For example:
$ dump -ov libfoo.so.1 libfoo.so.1: ***** PROGRAM EXECUTION HEADER ***** Type Offset Vaddr Paddr Filesz Memsz Flags Align LOAD 0x94 0x94 0x0 0x59c 0x59c r-x 0x10000 LOAD 0x630 0x10630 0x0 0x10c 0x12c rwx 0x10000 |
There are two loadable segments in the shared object libfoo.so.1, commonly referred to as the text and data segments. The text segment is mapped to allow reading and execution of its contents (r-x), whereas the data segment is mapped to also allow its contents to be modified (rwx). The memory size (Memsz) of the data segment differs from the file size (Filesz). This difference accounts for the .bss section, which is part of the data segment, and is dynamically created when the segment is loaded.
Programmers usually think of a file in terms of the symbols that define the functions and data elements within their code. These symbols can be displayed using nm(1). For example:
$ nm -x libfoo.so.1 [Index] Value Size Type Bind Other Shndx Name ......... [39] |0x00000538|0x00000000|FUNC |GLOB |0x0 |7 |_init [40] |0x00000588|0x00000034|FUNC |GLOB |0x0 |8 |foo [41] |0x00000600|0x00000000|FUNC |GLOB |0x0 |9 |_fini [42] |0x00010688|0x00000010|OBJT |GLOB |0x0 |13 |data [43] |0x0001073c|0x00000020|OBJT |GLOB |0x0 |16 |bss ......... |
The section that contains a symbol can be determined by referencing the section index (Shndx) field from the symbol table and by using dump(1) to display the sections within the file. For example:
$ dump -hv libfoo.so.1 libfoo.so.1: **** SECTION HEADER TABLE **** [No] Type Flags Addr Offset Size Name ......... [7] PBIT -AI 0x538 0x538 0x1c .init [8] PBIT -AI 0x554 0x554 0xac .text [9] PBIT -AI 0x600 0x600 0xc .fini ......... [13] PBIT WA- 0x10688 0x688 0x18 .data [16] NOBI WA- 0x1073c 0x73c 0x20 .bss ......... |
The output from both the previous nm(1) and dump(1) examples shows the association of the functions _init, foo, and _fini to the sections .init, .text and .fini. These sections, because of their read-only nature, are part of the text segment.
Similarly, the data arrays data, and bss are associated with the sections .data and .bss respectively. These sections, because of their writable nature, are part of the data segment.
Note - The previous dump(1) display has been simplified for this example.
Underlying System
When an application is built using a shared object, the entire loadable contents of the object are mapped into the virtual address space of that process at runtime. Each process that uses a shared object starts by referencing a single copy of the shared object in memory.
Relocations within the shared object are processed to bind symbolic references to their appropriate definitions. This results in the calculation of true virtual addresses that could not be derived at the time the shared object was generated by the link-editor. These relocations usually result in updates to entries within the process's data segments.
The memory management scheme underlying the dynamic linking of shared objects shares memory among processes at the granularity of a page. Memory pages can be shared as long as they are not modified at runtime. If a process writes to a page of a shared object when writing a data item, or relocating a reference to a shared object, it generates a private copy of that page. This private copy will have no effect on other users of the shared object. However, this page will have lost any benefit of sharing between other processes. Text pages that become modified in this manner are referred to as impure.
The segments of a shared object that are mapped into memory fall into two basic categories; the text segment, which is read-only, and the data segment, which is read-write. See "Analyzing Files" on how to obtain this information from an ELF file. An overriding goal when developing a shared object is to maximize the text segment and minimize the data segment. This optimizes the amount of code sharing while reducing the amount of processing needed to initialize and use a shared object. The following sections present mechanisms that can help achieve this goal.
Lazy Loading of Dynamic Dependencies
You can defer the loading of a shared object dependency until the dependency is first referenced by establishing the object as lazy loadable. See "Lazy Loading of Dynamic Dependencies".
For applications that require a small number of dependencies, running the application may load all the dependencies whether they are defined lazy loadable or not. However, under lazy loading, dependency processing may be deferred from process startup and spread throughout the process's execution.
For applications with many dependencies, employing lazy loading will often result in some dependencies not being loaded at all, as they may not be referenced for a particular thread of execution.
Position-Independent Code
The compiler can generate position-independent code under the -K pic option. Whereas the code within a dynamic executable is usually tied to a fixed address in memory, position-independent code can be loaded anywhere in the address space of a process. Because the code is not tied to a specific address, it will execute correctly without page modification at a different address in each process that uses it. This code creates programs that require the smallest amount of page modification at runtime.
When you use position-independent code, relocatable references are generated as an indirection that will use data in the shared object's data segment. The text segment code will remain read-only, and all relocation updates will be applied to corresponding entries within the data segment. See "Global Offset Table (Processor-Specific)" and "Procedure Linkage Table (Processor-Specific)" for more details on the use of these two sections.
If a shared object is built from code that is not position-independent, the text segment will usually require a large number of relocations to be performed at runtime. Although the runtime linker is equipped to handle this, the system overhead this creates can cause serious performance degradation.
You can identify a shared object that requires relocations against its text segment. Use dump(1) and inspect the output for any TEXTREL entry. For example:
$ cc -o libfoo.so.1 -G -R. foo.c $ dump -Lv libfoo.so.1 | grep TEXTREL [9] TEXTREL 0 |
Note - The value of the TEXTREL entry is irrelevant. Its presence in a shared object indicates that text relocations exist.
To prevent the creation of a shared object that contains text relocations use the link-editor's -z text flag. This flag causes the link-editor to generate diagnostics indicating the source of any non-position-independent code used as input. Such code results in a failure to generate the intended shared object. For example:
$ cc -o libfoo.so.1 -z text -G -R. foo.c Text relocation remains referenced against symbol offset in file foo 0x0 foo.o bar 0x8 foo.o ld: fatal: relocations remain against allocatable but \ non-writable sections |
Two relocations are generated against the text segment because of the non-position-independent code generated from the file foo.o. Where possible, these diagnostics will indicate any symbolic references that are required to carry out the relocations. In this case, the relocations are against the symbols foo and bar.
Another common cause of creating text relocations when generating a shared object is by including hand-written assembler code that has not been coded with the appropriate position-independent prototypes.
Note - You may want to experiment with some simple source files to determine coding sequences that enable position-independence. Use the compilers ability to generate intermediate assembler output.