C H A P T E R 3 |
SMS Internals |
SMS operations are generally performed by a set of daemons and commands. This chapter provides an overview of how SMS works and describes the SMS daemons, processes, commands, and system files. For more information about daemons, commands, and system files, refer to the System Management Services (SMS) 1.2 Reference Manual .
Caution - Changes made to files in /opt/SUNWSMS can cause serious damage to the system. Only very experienced system administrators should risk changing the files described in this chapter. |
The events that take place when the SMS boots are as follows:
User powers on the Sun Fire 15K (CPU/disk, and CD-ROM). The Solaris operating environment on the SC boots automatically.
During the boot process, the /etc/init.d/sms script is called. This script, for security reasons, disables forwarding, broadcast and multicasting over the MAN network. It then starts the SMS software by invoking a background process, which starts and monitors ssd . ssd is the SMS startup daemon responsible for starting and monitoring all the SMS daemons and servers.
ssd (1M) in turn invokes: mld , pcd , hwad , tmd , dsmd , esmd , mand , osd , dca , efe, and smnptd .
For more information, see SMS Daemons , Message Logging . For efe , refer to the Sun Management Center User's Guide .
Once the daemons are running, you can use SMS commands such as console .
SMS startup can take a few minutes during which time any commands run will return an error message indicating that SMS has not completed startup. The message "SMS software start-up complete" is posted to the platform log when startup is complete and can be viewed using the showlogs (1M) command.
The SMS 1.2 daemons play a central role on the Sun Fire 15K system. Daemons are persistent processes that provide SMS services to clients using an API.
Note - SMS daemons are started by ssd and should not be started manually from the command line. |
Daemons are always running, initiated at system startup, and restarted whenever necessary. Each daemon is fully described in its corresponding man page (with the exception of efe , which is referenced separately in the Sun Management Center documentation).
This section looks at the SMS daemons, their relationship to one another, and includes which CLIs (if any) access them.
FIGURE 3-1 illustrates the Sun Fire 15K client server overview.
dca (1M) supports remote dynamic reconfiguration (DR) by enabling communication between applications and the domain configuration server ( dcs ) running on a Solaris 8 or Solaris 9 domain. One dca per domain runs on the SC. Each dca communicates with its dcs over the Management Network (MAN).
ssd (1M) starts dca when the domain is brought up. ssd restarts dca if it is killed while the domain is still running. dca is terminated when the domain is shut down.
dca is an SMS application that waits for dynamic reconfiguration requests. When a DR request arrives, dca creates a dcs session. Once a session is established, dca forwards the request to dcs . dcs attempts to honor the DR request and sends the results of the operation to the dca . Once the results have been sent, the session is ended. The remote DR operation is complete when dca returns the results of the DR operation.
FIGURE 3-2 illustrates the DCA client server relationship to the SMS daemons and CLIs.
dsmd (1M) monitors domain state signatures, CPU reset conditions and Solaris heartbeat for up to 18 domains. It also handles domain stop events related to hardware failure.
dsmd detects timeouts that can occur in reboot transition flow and panic transition flow, and handles various domain hung conditions.
dsmd notifies the domain X server ( dxs (1M)) and Sun Management Center of all domain state changes and automatically recovers the domain based on the domain state signature, domain stop events, and automatic system recovery (ASR) Policy. ASR Policy consists of those procedures which restore the system to running all properly configured domains after one or more domains have been rendered inactive. This can be due to software or hardware failures or to unacceptable environmental conditions. For more information, see Automatic System Recovery (ASR) and Domain Stop Events .
FIGURE 3-3 illustrates DSMD client server relationship to the SMS daemons and CLIs.
dxs (1M) provides software support for a running domain. This support includes virtual console functionality, dynamic reconfiguration support, and HPCI support. dxs handles domain driver requests and events. The virtual console functionality allows one or more users running the console program to access the domain's virtual console. dxs acts as a link between SMS console applications and the domain virtual console drivers.
A Sun Fire 15K system can support up to 18 different domains. Each domain may require software support from the SC, and dxs provides that support. The following domain related projects require dxs support:
There is one domain X server for each Sun Fire 15K domain. dxs is started by ssd for every active domain and terminated when the domain is shut down.
FIGURE 3-5 illustrates DXS client server relationship to the SMS daemons.
esmd (1M) monitors system cabinet environmental conditions, for example. voltage, temperature, fan tray, and power supply. esmd logs abnormal conditions and takes action to protect the hardware, if necessary.
See Environmental Events for more information on esmd .
FIGURE 3-5 illustrates ESMD client server relationship to the SMS daemons.
fomd (1M) is the core of the SC failover mechanism. fomd detects faults on the local and remote SCs and takes the appropriate action (directing a failover/takeover).
fomd ensures that important configuration data is kept synchronized between both SCs. fomd runs on both the master and spare SC.
FIGURE 3-6 illustrates FOMD client server relationship to the SMS daemons.
frad (1M) is the field replaceable unit (FRU) access daemon for SMS. frad provides controlled access to any SEEPROM within the Sun Fire 15K platform that is accessible by the SC. frad supports dynamic FRUID which provides improved FRU data access.
FIGURE 3-7 illustrates FRAD client server relationship to the SMS daemons.
hwad (1M) provides hardware access to SMS daemons and a mechanism for all daemons exclusively to access, control, monitor, and configure the hardware.
hwad runs in either main or spare mode when it comes up. The failover daemon ( fomd (1M)) determines which role hwad will play.
At startup, hwad opens all the drivers ( sbbc , echip , gchip , and consbus ) and uses ioctl (2) calls to interface with them. It reads the contents of the device presence register to identify the boards present in the system and makes them accessible to the clients. hwad also configures the local system clock and sets the clock source for each board present in the system.
IOSRAM and Mbox interfaces are also provided by hwad . This helps communication between the SC and the domain. For dynamic reconfiguration (DR), hwad directs communication to the IOSRAM (tunnel switch).
For darb interrupts, hwad notifies the dsmd (1M) if there is a dstop or rstop . It also notifies related SMS daemon(s) depending on the type of the Mbox interrupt that occurs.
hwad detects and recovers console bus and jtag errors.
Hardware access to the Sun Fire 15K system on the SC is done either by going through the PCI bus or console bus. Through the PCI bus you can access:
Through the Console bus you can access:
Various ASICs internal registers
Read/write chips
Local I2C devices on various boards for temperature and chip level power control/status.
FIGURE 3-8 illustrates HWAD client server relationship to the SMS daemons and CLIs.
The key management daemon provides a mechanism for managing security for socket communications between the SC and the domains.
The current default configuration includes authentication policies for the dca (1M) and dxs (1M) clients on the SC, which connect to the dcs (1M) and cvcd (1M) servers on a domain.
kmd (1M) manages the IPSec security associations (SAs) needed to secure the communication between the SC and servers running on a domain.
kmd manages per-socket policies for connections initiated by clients on the SC to servers on a domain.
At system startup, kmd creates a domain interface for each domain that is active. An active domain has both a valid IOSRAM and is running the Solaris operating environment. Domain change events can trigger creation or removal of a domain kmd interface.
kmd manages shared policies for connections initiated by clients on the domain to servers on the SC. The kmd policy manager reads a configuration file and stores policies used to manage security associations. A request received by kmd is compared to the current set of policies to ensure that it is valid and to set various parameters for the request.
Static global policies are configured using ipsecconf (1M) and associated data file ( /etc/inet/ipsecinit.conf ). Global policies are used for connections initiated from the domains to the SC. Corresponding entries are made in the kmd configuration file. Shared security associations for domain to SC connections are created by kmd when the domain becomes active.
Note Note - In order to work properly, policies created by ipsecconf and kmd must match. |
The
kmd
configuration file is used for both SC-to-domain and domain-to-SC initiated connections. The
kmd
configuration file resides in
/etc/opt/SUNWSMS/config/kmd_policy.conf
.
The format of the kmd configuration files is as follows:
dir:d_port:protocol:sa_type:aut_alg:encr_alg:domain:login
FIGURE 3-9 illustrates KMD client server relationship to the SMS daemons.
mand (1M) supports the Management Network (MAN). See Management Network Services mand runs in either main or spare mode when it comes up. The failover daemon ( fomd (1M)) determines which role mand plays.
At system startup, mand creates the mapping between domain_tag and IP address in the platform configuration database ( pcd ), and configures the SC-to-SC private network. This information is obtained from the file /etc/opt/SUNWSMS/config/MAN.cf , which is created by the smsconfig (1M) command. mand then obtains domain configuration information from the pcd and programs the scman (7d) driver accordingly. After initializing the pcd and the scman driver, mand registers for domain keyswitch events, tracks changes in domain active board lists, tracks active Ethernet information from the dman (7d) driver and updates the scman driver, as appropriate.
mand also communicates system startup MAN information to each domain when the domain is powered on ( setkeyswitch on). This information includes Ethernet and MAN IP addressing information. This information is used during the initial software installation on the domain.
FIGURE 3-10 illustrates MAND client server relationship to the SMS daemons.
The message logging daemon, mld , captures the output of all other SMS daemons and processes. mld supports three configuration directives: File, Level, and Mode, in the /var/opt/SUNWSMS/adm/.logger file.
File--Specifies the default output locations for the message files. The default is msgdaemon and should not be changed.
Platform messages are stored on the SC in /var/opt/SUNWSMS/adm/platform/messages
Domain messages are stored on the SC in /var/opt/SUNWSMS/adm/ domain_id /messages
Domain console messages are stored on the SC in /var/opt/SUNWSMS/adm/ domain_id /console
Domain syslog messages are stored on the SC in /var/opt/SUNWSMS/adm/ domain_id /syslog .
Level--Specifies the minimum level necessary for a message to be logged. The supported levels are NOTICE , WARNING , ERR , CRIT , ALERT , and EMERG . The default level is NOTICE .
Mode--Specifies the verbosity of the messages. Two modes are available: verbose and terse . The default is verbose .
mld monitors the size of each of the message log files. For each message log type, mld keeps up to ten message files at a time, x.0 though x.9. For more information on log messages, see Message Logging
FIGURE 3-11 illustrates MLD client server relationship to the SMS daemons and CLIs.
osd (1M) provides support to the OpenBoot PROM Process running on a domain. osd and OpenBoot PROM communication is through a mailbox that resides on the domain. The osd daemon monitors the OpenBoot PROM mailbox. When the OpenBoot PROM writes requests to the mailbox, osd executes the requests accordingly.
osd runs at all times on the SC even if there are no domains configured. osd provides virtual TOD service, virtual NVRAM, and virtual REBOOTINFO for OpenBoot PROM and an interface to dsmd (1M) to facilitate auto-domain recovery. osd also provides an interface for the following commands: setobpparams (1M), showobpparams (1M), setdate (1M) and showdate (1M). See also Chapter 4 .
osd is a trusted daemon in that it will not export any interface to other SMS processes. It exclusively reads and writes from and to all OpenBoot PROM mailboxes. There is one OpenBoot PROM mailbox for each domain.
osd has two main tasks; to maintain its current state of the domain configuration, and to monitor the OpenBoot PROM mailbox.
FIGURE 3-12 illustrates OSD client server relationship to the SMS daemons and CLIs.
pcd (1M) is a Sun Fire 15K system management daemon that runs on the SC with primary responsibility for managing and providing controlled access to platform and domain configuration data.
pcd manages an array of information that describes the Sun Fire system configuration. In its physical form, the database information is a collection of flat files, each file appropriately identifiable by the information contained within it. All SMS applications that want to access the database information must go through pcd .
In addition to managing platform configuration data, pcd is responsible for platform configuration change notifications. When pertinent platform configuration changes occur within the system, the pcd sends out notification of the changes to clients who have registered to receive the notification.
FIGURE 3-13 illustrates PCD client server relationship to the SMS daemons and CLIs.
The following information uniquely identifies the platform:
Platform type
Platform name
Rack ID
Cacheable Address Slice Map
System clock frequency
System clock type
SC IP address
SC0 to SCI IP address
SC1 to SC0 IP address
SC to SC IP netmask
The following information is domain related:
domain_id
domain_tag
OS version (currently not used)
OS type (currently not used)
Available component list
Assigned board list
Active board list
Golden IOSRAM I/O board
Virtual keyswitch setting for a domain
Active Ethernet I/O board
Domain creation time
Domain dump state
Domain bringup priority
IP host address
Host name
Host netmask
Host broadcast address
Virtual OpenBoot PROM address
Physical OpenBoot PROM address
The following information is related to system boards:
Expander position
Slot position
Board type
Board state
DomainID assigned to board
Available component list state
Board test status
Board test level
Board memory clear state
ssd (1M) is responsible for starting and maintaining all SMS daemons and domain X servers.
ssd checks the environment for availability of certain files and the availability of the Sun Fire 15K system, sets environment variables, and then starts esmd (1M). esmd monitors environmental changes by polling the related hardware components. When an abnormal condition is detected, esmd handles it or generates an event so that the correspondent handlers will take appropriate action and/or update their current status. Some of those handlers are: dsmd , pcd and Sun Management Center (if installed). The main objective of ssd is to ensure that the SMS daemons and servers are always up and running.
FIGURE 3-14 illustrates SSD client server relationship to the SMS daemons.
ssd
uses a configuration file,
ssd_start
to determine which components and in what order to start up the SMS software. This configuration file is located in the
/etc/opt/SUNWSMS/startup
directory.
ssd_start consists of entries in the following format:
name:args:nice:role:type:trigger:startup_timeout:shutdown_timeout:uid:start_order:stop_order
Each time ssd starts, it comes up in spare mode. Once ssd has started the platform core daemons running, it queries fomd (1M) for its role. If the fomd query returns with spare , ssd will stay in this mode. If the fomd returns with main , then ssd transitions to main mode.
After this initial query phase, ssd only switches between modes through events received from the fomd .
When in spare mode, ssd starts and monitors all of the core platform role, auto trigger programs in the ssd_start file. Currently, this list is made up of the following programs.
If, while in main mode, ssd receives a spare event, then ssd shuts down all programs except the core platform role and auto trigger programs found in the ssd_start file.
ssd will stay in spare mode until it receives a main event. At that time, ssd starts and monitors (in addition to the already running daemons) all of the platform role (main only) event trigger programs, in the ssd_start file. Currently, this list is made up of the following programs.
Finally, after starting all the platform role, event trigger programs, ssd queries the pcd to determine which domains are active. For each of these domains, ssd starts all the domain role, event trigger programs found in the ssd_start file.
ssd uses domain start and stop events from pcd as instructions for starting and stopping domain-specific servers.
Upon reception, ssd either starts or stops all of the domain role, event trigger programs (for the domain identified) found in the ssd_start file.
Once ssd has started a process, it monitors the process and restarts in the event the process fails.
In certain instances, such as SMS software upgrades, the SMS software needs to be shut down. ssd provides a mechanism to shut down itself and all SMS daemons and servers under its control.
ssd notifies all SMS software components under its control to shut down. After all the SMS software components have been shut down, ssd shuts itself down.
tmd (1M) provides task management services such as scheduling for SMS. This reduces the number of conflicts that can arise during concurrent invocations of the hardware tests and configuration software.
Currently, the only service exported by tmd is the hpost (1M) scheduling service. In the Sun Fire 15K system, hpost is scheduled based on two factors.
Restriction of hpost . When the platform first comes up and no domains have been configured, a single instance of hpost takes exclusive control of all expanders and configures the centerplane ASICs. All subsequent hpost invocations wait until this is complete before proceeding.
Only a single hpost invocation can act on any one expander at a time. For a Sun Fire 15K system configured without split expanders, this restriction does not prevent multiple hpost invocations from running. This restriction does come into play however, when the machine is configured with split expanders.
System-wide hpost throttle limit. There is a limit to the number of concurrent hpost invocations that can run at a single time without saturating the system. The ability to throttle hpost invocations is available using the -t option in ssd_startup .
Caution Caution - Changing the default value can adversely affect system functionality. Do not adjust this parameter unless instructed by a Sun service representative to do so. |
FIGURE 3-15 illustrates TMD client server relationship to the SMS daemons.
Basic SMS environment defaults must be set in your configuration files to run SMS commands.
PATH to include /opt/SUNWSMS/bin
LD_LIBRARY_PATH to include /opt/SUNWSMS/lib
MANPATH to include /opt/SUNWSMS/man
Setting other environment variables when you log in can save time. TABLE 3-2 suggests some useful SMS environment variables.
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.