SRDB ID |
|
Synopsis |
|
Date |
20267 |
|
Commands such as drvconfig can hang on fully populated servers : Can't open /dev/ksyms |
|
30 Jul 2001 |
Various Solaris commands can hang on fully configured Ultra Enterprise
Servers running Solaris 2.5.1 and 2.6.
Some commands which can hang include:
drvconfig
modinfo
prtconf
netstat
dmesg
crash
adb
The problem is a direct result of the OS's inability to "open" /dev/ksyms.
The root cause is the exhaustion of kobj symbol space as outlined in
Internal BUG ID 4100378 "kobj symbol space should grow dynamically."
As more and more drivers are added to a server, this problem becomes more
evident, especially in the case of large arrays being added to a server
(often seen with EMC arrays) and causing a drvconfig hang situation.
An excellent way to confirm the problem is to compare truss output for
the failed command against truss output for the same (successful) command
on another system that is running the same OS.
For example, the truss output for a failed dmesg command on a 2.5.1 server
looks something like:
dilbert#truss -o /tmp/dmesg dmesg
execve("/usr/sbin/dmesg", 0xEFFFFE68, 0xEFFFFE70) argc = 1
*** SGID: rgid/egid/sgid = 1 / 3 / 3 ***
open("/dev/zero", O_RDONLY) = 3
mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7C0000
open("/usr/lib/libkvm.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFA80) = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF790000
munmap(0xEF796000, 57344) = 0
mmap(0xEF7A4000, 4687, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 16384) = 0xEF7A4000
close(4) = 0
open("/usr/lib/libelf.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFA80) = 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 131072, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF760000
munmap(0xEF770000, 57344) = 0
mmap(0xEF77E000, 5328, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 57344) = 0xEF77E000
close(4) = 0
open("/usr/lib/libc.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFA80) = 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 622592, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF680000
munmap(0xEF700000, 57344) = 0
mmap(0xEF70E000, 29304, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 516096) = 0xEF70E000
mmap(0xEF716000, 5320, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xEF716000
close(4) = 0
open("/usr/lib/libdl.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFA80) = 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
close(4) = 0
open("/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFF8A8) = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF750000
mmap(0x00000000, 81920, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF730000
munmap(0xEF734000, 57344) = 0
mmap(0xEF742000, 5464, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 8192) = 0xEF742000
close(4) = 0
close(3) = 0
munmap(0xEF750000, 8192) = 0
open("/usr/platform/SUNW,Ultra-Enterprise/lib/libkvm_psr.so.1", O_RDONLY) = 3
fstat(3, 0xEFFFFAC8) = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 3, 0) = 0xEF750000
mmap(0x00000000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF660000
munmap(0xEF666000, 57344) = 0
mmap(0xEF674000, 4687, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 16384) = 0xEF674000
close(3) = 0
munmap(0xEF750000, 8192) = 0
brk(0x00021320) = 0
Running truss against the same command ( that runs successfully ) on
a box with an identical OS ( in this case 2.5.1 ) looks something
like:
dogbert# truss -o /tmp/dmesg dmesg
execve("/usr/sbin/dmesg", 0xEFFFFE90, 0xEFFFFE98) argc = 1
*** SGID: rgid/egid/sgid = 1 / 3 / 3 ***
open("/dev/zero", O_RDONLY) = 3
mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7C0000
open("/usr/lib/libkvm.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFB44) = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF790000
munmap(0xEF796000, 57344) = 0
mmap(0xEF7A4000, 4687, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 16384) = 0xEF7A4000
close(4) = 0
open("/usr/lib/libelf.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFB44) = 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 122880, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF760000
munmap(0xEF76E000, 57344) = 0
mmap(0xEF77C000, 4460, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 49152) = 0xEF77C000
close(4) = 0
open("/usr/lib/libc.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFB44) = 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 622592, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF680000
munmap(0xEF700000, 57344) = 0
mmap(0xEF70E000, 26688, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 516096) = 0xEF70E000
mmap(0xEF716000, 2696, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xEF716000
close(4) = 0
open("/usr/lib/libdl.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFFB44) = 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
close(4) = 0
open("/usr/platform/SUNW,Ultra-1/lib/libc_psr.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFF9A4) = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF750000
mmap(0x00000000, 81920, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF730000
munmap(0xEF734000, 57344) = 0
mmap(0xEF742000, 5440, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 8192) = 0xEF742000
close(4) = 0
close(3) = 0
munmap(0xEF750000, 8192) = 0
open("/usr/platform/SUNW,Ultra-1/lib/libkvm_psr.so.1", O_RDONLY) Err#2 ENOENT
brk(0x00021320) = 0
brk(0x00023320) = 0
open("/dev/ksyms", O_RDONLY) = 3
...
...
...
By comparing the truss output, it becomes apparent that dmesg is
hanging when trying to "open" /dev/ksyms.
SOLUTION SUMMARY:
To resolve, either upgrade to Solaris 7, or just manually define a
kobj symbol space that is large enough to accomodate all required
devices and modules on the afflicted server.
To check the current kobj_map_space_len setting, run command..
# echo kobj_map_space_len/X|adb -k
physmem f832
kobj_map_space_len:
kobj_map_space_len: 100000
The example above displays a default setting of 1MB ( 0x100000)
To double the size of kobj_map_space_len from the example above, add
the following entry into /etc/system then reboot..
set kobj_map_space_len=0x200000
The minimum required size of kobj_map_space_len will is function of the
number of drivers and modules that the server requires. Hence, required
minimum values will vary.
Please note that any defined value *MUST* FALL UPON AN EVEN PAGE BOUNDARY!!
Typically, given these constraints, the following definition should be
more than adequate for most large fully populated servers:
set kobj_map_space_len=0x300000
If there is any reason to suspect that 3MB is not large enough
to accomodate the server's needs, then try:
set kobj_map_space_len=0x400000
A reboot is required for the new table size to take effect.
INTERNAL SUMMARY:
Solaris 7 resolves this problem by altering the method by which the kobj symbol
space is allocated ( In Solaris 7 kobj symbol space is allocated dynamically,
whereas, in 2.5.1 and 2.6 kobj symbol space is statically defined when the
system first boots ). Since this bug primarily affects large servers, it
is very doubtful that this solution will be well received by Customers.
Hence, the best method of addressing this issue is to provide method 1 as an
intermediate solution.
SUBMITTER: Scott A Surguine
APPLIES TO: Hardware/Ultra Enterprise/Servers, Operating Systems/Solaris/Solaris 2.5.1, AFO Vertical Team Docs, AFO Vertical Team Docs/Kernel
ATTACHMENTS:
Copyright (c) 1997-2003 Sun Microsystems, Inc.