|
| gprof - display call-graph profile
data |
SYNOPSIS
| gprof [-abcCDlsz] [-e function-name] [-E function-name] [-f function-name] [-F function-name] [ image-file [ profile-file ...] ] [ -n number of functions] |
|
The gprof utility produces an execution profile
of a program. The effect of called routines is incorporated in the profile
of each caller. The profile data is taken from the call graph profile file
that is created by programs compiled with the -xpg option
of cc(1), or by
the -pg option with other compilers, or by setting the
LD_PROFILE environment variable for shared objects. See ld.so.1(1). These compiler options also
link in versions of the library routines which are compiled for profiling.
The symbol table in the executable image file image-file
(a.out by default) is read and correlated with the
call graph profile file profile-file (gmon.out by default).
First, execution times for each routine are propagated along the edges
of the call graph. Cycles are discovered, and calls into a cycle are made
to share the time of the cycle. The first listing shows the functions sorted
according to the time they represent, including the time of their call graph
descendants. Below each function entry is shown its (direct) call-graph
children and how their times are propagated to this function. A similar
display above the function shows how this function's time and the time of
its descendants are propagated to its (direct) call-graph parents.
Cycles are also shown, with an entry for the cycle as a whole and
a listing of the members of the cycle and their contributions to the time
and call counts of the cycle.
Next, a flat profile is given, similar to that provided by prof(1).
This listing gives the total execution times and call counts for each of
the functions in the program, sorted by decreasing time. Finally, an index
is given, which shows the correspondence between function names and call-graph
profile index numbers.
A single function may be split into subfunctions for profiling by
means of the MARK macro. See prof(5).
Beware of quantization errors. The granularity of the sampling is
shown, but remains statistical at best. It is assumed that the time for
each execution of a function can be expressed by the total time for the
function divided by the number of times the function is called. Thus the
time propagated along the call-graph arcs to parents of that function is
directly proportional to the number of times that arc is traversed.
The profiled program must call exit(2)
or return normally for the profiling information to be saved in the gmon.out file.
|
|
The following options are supported:
- -a
- Suppress printing statically declared functions.
If this option is given, all relevant information about the static function
(for instance, time samples, calls to other functions, calls from other
functions) belongs to the function loaded just before the static function
in the a.out file.
- -b
- Brief. Suppress
descriptions of each field in the profile.
- -c
- Discover the
static call-graph of the program by a heuristic which examines the text
space of the object file. Static-only parents or children are indicated
with call counts of 0. Note that for dynamically linked executables, the
linked shared objects' text segments are not examined.
- -C
- Demangle C++
symbol names before printing them out.
- -D
- Produce a profile
file gmon.sum that represents the difference of the
profile information in all specified profile files. This summary profile
file may be given to subsequent executions of gprof
(also with -D) to summarize profile data across several
runs of an a.out file. See also the -s
option.
As an example, suppose function A calls function B n
times in profile file gmon.sum, and m
times in profile file gmon.out. With -D,
a new gmon.sum file will be created showing the number
of calls from A to B as n-m.
- -e function-name
- Suppress printing the graph profile entry for routine function-name and all its descendants (unless they have other
ancestors that are not suppressed). More than one -e option
may be given. Only one function-name may be
given with each -e option.
- -E function-name
- Suppress printing the graph profile entry for routine function-name (and its descendants) as -e,
below, and also exclude the time spent in function-name
(and its descendants) from the total and percentage time computations. More
than one -E option may be given. For example:
`-E mcount -E mcleanup'
is the default.
- -f function-name
- Print the graph profile entry only for routine function-name and its descendants. More than one -f option may be given. Only one function-name
may be given with each -f option.
- -F function-name
- Print the graph profile entry only for routine function-name and its descendants (as -f,
below) and also use only the times of the printed routines in total time
and percentage computations. More than one -F option may
be given. Only one function-name may be given
with each -F option. The -F option overrides
the -E option.
- -l
- Suppress the
reporting of graph profile entries for all local symbols. This option would
be the equivalent of placing all of the local symbols for the specified
executable image on the -E exclusion list.
- -n
- Limits the
size of flat and graph profile listings to the top n
offending functions.
- -s
- Produce a profile
file gmon.sum which represents the sum of the profile
information in all of the specified profile files. This summary profile
file may be given to subsequent executions of gprof (also
with -s) to accumulate profile data across several runs
of an a.out file. See also the -D
option.
- -z
- Display routines
which have zero usage (as indicated by call counts and accumulated time).
This is useful in conjunction with the -c option for discovering
which routines were never called. Note that this has restricted use for
dynamically linked executables, since shared object text space will not
be examined by the -c option.
|
|
-
PROFDIR
- If this environment variable contains
a value, place profiling output within that directory, in a file named pid.programname. pid is the process ID and programname is the name of the program being profiled, as
determined by removing any path prefix from the argv[0]
with which the program was called. If the variable contains a null value,
no profiling output is produced. Otherwise, profiling output is placed
in the file gmon.out.
|
|
-
a.out
- executable file containing namelist
-
gmon.out
- dynamic call-graph and profile
-
gmon.sum
- summarized dynamic call-graph and profile
-
$PROFDIR/pid.programname
-
|
|
See attributes(5)
for descriptions of the following attributes:
ATTRIBUTE TYPE | ATTRIBUTE VALUE |
Availability | SUNWbtool |
|
|
cc(1), ld.so.1(1), prof(1), exit(2), pcsample(2), profil(2), malloc(3C), malloc(3MALLOC), monitor(3C), attributes(5), prof(5)
Graham, S.L., Kessler, P.B., McKusick, M.K., `gprof: A
Call Graph Execution Profiler', Proceedings of the
SIGPLAN '82 Symposium on Compiler Construction, SIGPLAN Notices, Vol. 17, No. 6, pp. 120-126, June 1982.
Linker and Libraries Guide
|
|
If the executable image has been stripped and has no symbol table
(.symtab), then gprof will
read the dynamic symbol table (.dyntab), if
present. If the dynamic symbol table is used, then only the information
for the global symbols will be available, and the behavior will be identical
to the -a option.
LD_LIBRARY_PATH must not contain /usr/lib as a component when compiling a program for profiling. If
LD_LIBRARY_PATH contains /usr/lib,
the program will not be linked correctly with the profiling versions of
the system libraries in /usr/lib/libp.
The times reported in successive identical runs may show variances
because of varying cache-hit ratios that result from sharing the cache with
other processes. Even if a program seems to be the only one using the machine,
hidden background or asynchronous processes may blur the data. In rare cases,
the clock ticks initiating recording of the program counter may "beat" with
loops in a program, grossly distorting measurements. Call counts are always
recorded precisely, however.
Only programs that call exit or return from main are guaranteed to produce a profile file, unless a final
call to monitor is explicitly coded.
Functions such as mcount(), _mcount(), moncontrol(), _moncontrol(), monitor(), and _monitor() may appear in the gprof report. These functions are part of the profiling implementation
and thus account for some amount of the runtime overhead. Since these functions
are not present in an unprofiled application, time accumulated and call
counts for these functions may be ignored when evaluating the performance
of an application.
64-bit profiling
|
64-bit profiling may be used freely with dynamically linked
executables, and profiling information is collected for the shared objects
if the objects are compiled for profiling. Care must be applied to interpret
the profile output, since it is possible for symbols from different shared
objects to have the same name. If name duplication occurs in the profile
output, the module id prefix before the symbol name in the symbol index
listing can be used to identify the appropriate module for the symbol.
When using the -s or -D option to
sum multiple profile files, care must be taken not to mix 32-bit profile
files with 64-bit profile files.
|
32-bit profiling
|
32-bit profiling may be used with dynamically linked executables,
but care must be applied. In 32-bit profiling, shared objects cannot
be profiled with gprof. Thus, when a profiled, dynamically
linked program is executed, only the "main" portion of the image is sampled.
This means that all time spent outside of the "main" object, that is, time
spent in a shared object, will not be included in the profile summary; the
total time reported for the program may be less than the total time used
by the program.
Because the time spent in a shared object cannot be accounted for,
the use of shared objects should be minimized whenever a program is profiled
with gprof. If desired, the program should be linked
to the profiled version of a library (or to the standard archive version
if no profiling version is available), instead of the shared object to get
profile information on the functions of a library. Versions of profiled
libraries may be supplied with the system in the /usr/lib/libp directory. Refer to compiler driver documentation on profiling.
Consider an extreme case. A profiled program dynamically linked with
the shared C library spends 100 units of time in some libc
routine, say, malloc(). Suppose malloc()
is called only from routine B and B
consumes only 1 unit of time. Suppose further that routine A
consumes 10 units of time, more than any other routine in the "main" (profiled)
portion of the image. In this case, gprof will conclude
that most of the time is being spent in A and almost
no time is being spent in B. From this it will be almost
impossible to tell that the greatest improvement can be made by looking
at routine B and not routine A. The
value of the profiler in this case is severely degraded; the solution is
to use archives as much as possible for profiling.
|
|
|
Parents which are not themselves profiled will have the time of their
profiled children propagated to them, but they will appear to be spontaneously
invoked in the call-graph listing, and will not have their time propagated
further. Similarly, signal catchers, even though profiled, will appear to
be spontaneous (although for more obscure reasons). Any profiled children
of signal catchers should have their times propagated properly, unless the
signal catcher was invoked during the execution of the profiling routine,
in which case all is lost.
|
| |