SRDB ID   Synopsis   Date
20267   Commands such as drvconfig can hang on fully populated servers : Can't open /dev/ksyms   30 Jul 2001

Status Issued

Description
Various Solaris commands can hang on fully configured Ultra Enterprise 
Servers running Solaris 2.5.1 and 2.6.  

Some commands which can hang include: 

	drvconfig
	modinfo
	prtconf
	netstat
	dmesg
	crash
	adb                        
The problem is a direct result of the OS's inability to "open" /dev/ksyms.

The root cause is the exhaustion of kobj symbol space as outlined in 
Internal BUG ID 4100378 "kobj symbol space should grow dynamically."                        
As more and more drivers are added to a server, this problem becomes more
evident, especially in the case of large arrays being added to a server
(often seen with EMC arrays) and causing a drvconfig hang situation.

An excellent way to confirm the problem is to compare truss output for 
the failed command against truss output for the same (successful) command 
on another system that is running the same OS.

For example, the truss output for a failed dmesg command on a 2.5.1 server
looks something like:

dilbert#truss -o /tmp/dmesg dmesg

execve("/usr/sbin/dmesg", 0xEFFFFE68, 0xEFFFFE70)  argc = 1
   *** SGID: rgid/egid/sgid = 1 / 3 / 3  ***
open("/dev/zero", O_RDONLY)			= 3
mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7C0000
open("/usr/lib/libkvm.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFA80)				= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF790000
munmap(0xEF796000, 57344)			= 0
mmap(0xEF7A4000, 4687, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 16384) = 0xEF7A4000
close(4)					= 0
open("/usr/lib/libelf.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFA80)				= 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 131072, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF760000
munmap(0xEF770000, 57344)			= 0
mmap(0xEF77E000, 5328, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 57344) = 0xEF77E000
close(4)					= 0
open("/usr/lib/libc.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFA80)				= 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 622592, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF680000
munmap(0xEF700000, 57344)			= 0
mmap(0xEF70E000, 29304, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 516096) = 0xEF70E000
mmap(0xEF716000, 5320, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xEF716000
close(4)					= 0
open("/usr/lib/libdl.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFA80)				= 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
close(4)					= 0
open("/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFF8A8)				= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF750000
mmap(0x00000000, 81920, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF730000
munmap(0xEF734000, 57344)			= 0
mmap(0xEF742000, 5464, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 8192) = 0xEF742000
close(4)					= 0
close(3)					= 0
munmap(0xEF750000, 8192)			= 0
open("/usr/platform/SUNW,Ultra-Enterprise/lib/libkvm_psr.so.1", O_RDONLY) = 3
fstat(3, 0xEFFFFAC8)				= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 3, 0) = 0xEF750000
mmap(0x00000000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF660000
munmap(0xEF666000, 57344)			= 0
mmap(0xEF674000, 4687, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 16384) = 0xEF674000
close(3)					= 0
munmap(0xEF750000, 8192)			= 0
brk(0x00021320)					= 0


Running truss against the same command ( that runs successfully ) on
a box with an identical OS ( in this case 2.5.1 ) looks something
like:


dogbert# truss -o /tmp/dmesg dmesg

execve("/usr/sbin/dmesg", 0xEFFFFE90, 0xEFFFFE98)  argc = 1
    *** SGID: rgid/egid/sgid = 1 / 3 / 3  ***
open("/dev/zero", O_RDONLY)			= 3
mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7C0000
open("/usr/lib/libkvm.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFB44)				= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 90112, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF790000
munmap(0xEF796000, 57344)			= 0
mmap(0xEF7A4000, 4687, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 16384) = 0xEF7A4000
close(4)					= 0
open("/usr/lib/libelf.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFB44)				= 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 122880, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF760000
munmap(0xEF76E000, 57344)			= 0
mmap(0xEF77C000, 4460, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 49152) = 0xEF77C000
close(4)					= 0
open("/usr/lib/libc.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFB44)				= 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
mmap(0x00000000, 622592, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF680000
munmap(0xEF700000, 57344)			= 0
mmap(0xEF70E000, 26688, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 516096) = 0xEF70E000
mmap(0xEF716000, 2696, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xEF716000
close(4)					= 0
open("/usr/lib/libdl.so.1", O_RDONLY)		= 4
fstat(4, 0xEFFFFB44)				= 0
mmap(0xEF7B0000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_FIXED, 4, 0) = 0xEF7B0000
close(4)					= 0
open("/usr/platform/SUNW,Ultra-1/lib/libc_psr.so.1", O_RDONLY) = 4
fstat(4, 0xEFFFF9A4)				= 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 4, 0) = 0xEF750000
mmap(0x00000000, 81920, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF730000
munmap(0xEF734000, 57344)			= 0
mmap(0xEF742000, 5440, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 8192) = 0xEF742000
close(4)					= 0
close(3)					= 0
munmap(0xEF750000, 8192)			= 0
open("/usr/platform/SUNW,Ultra-1/lib/libkvm_psr.so.1", O_RDONLY) Err#2 ENOENT
brk(0x00021320)					= 0
brk(0x00023320)					= 0
open("/dev/ksyms", O_RDONLY)			= 3
...
...
...


By comparing the truss output, it becomes apparent that dmesg is
hanging when trying to "open" /dev/ksyms.

                        
SOLUTION SUMMARY:
To resolve, either upgrade to Solaris 7, or just manually define a 
kobj symbol space that is large enough to accomodate all required 
devices and modules on the afflicted server.
To check the current kobj_map_space_len setting, run command..

	# echo kobj_map_space_len/X|adb -k
	physmem f832
	kobj_map_space_len:
	kobj_map_space_len:             100000

The example above displays a default setting of 1MB ( 0x100000)

To double the size of kobj_map_space_len from the example above, add 
the following entry into /etc/system then reboot..

	set kobj_map_space_len=0x200000
 
The minimum required size of kobj_map_space_len will is function of the 
number of drivers and modules that the server requires.  Hence, required 
minimum values will vary.  

Please note that any defined value *MUST* FALL UPON AN EVEN PAGE BOUNDARY!!
  
Typically, given these constraints, the following definition should be 
more than adequate for most large fully populated servers:                        
	set kobj_map_space_len=0x300000                        
If there is any reason to suspect that 3MB is not large enough
to accomodate the server's needs, then try:

	set kobj_map_space_len=0x400000


A reboot is required for the new table size to take effect.
	   
                        
INTERNAL SUMMARY:
Solaris 7 resolves this problem by altering the method by which the kobj symbol 
space is allocated ( In Solaris 7 kobj symbol space is allocated dynamically,  
whereas, in 2.5.1 and 2.6 kobj symbol space is statically defined when the 
system first boots ).  Since this bug primarily affects large servers, it 
is very doubtful that this solution will be well received by Customers.  
Hence, the best method of addressing this issue is to provide method 1 as an
intermediate solution.                        
SUBMITTER: Scott A Surguine APPLIES TO: Hardware/Ultra Enterprise/Servers, Operating Systems/Solaris/Solaris 2.5.1, AFO Vertical Team Docs, AFO Vertical Team Docs/Kernel ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.