SRDB ID   Synopsis   Date
48153   Sun Fire[TM] 12K/15K: fomd error 8542   30 Oct 2002

Status Issued

Description
- Problem Statement: 

Propagation/retrieval of a file during system controller datasync fails with fomd error 8542

- Symptoms:

After a System Controller (SC) failover is activated and file propagation begins, the 
/var/opt/SUNWSMS/SMS1.2/adm/platform/messages file will have errors similar to:

Apr 17 23:56:39 2002 xc46-sc0 fomd[390]:  [8542 22273080859978 WARNING FOI2Net.cc 1592]
Propagation/retrieval of "/var/opt/SUNWSMS/adm/B/post/post020307.1031.02.log" failed -
"rcmd:  socket:  Cannot assign requested address"

Apr 17 23:56:39 2002 xc46-sc0 fomd[390]:  [8542 22273209520193 WARNING FOI2Net.cc 1592]
Propagation/retrieval of "/var/opt/SUNWSMS/adm/B/post/post020306.1042.56.log" failed -
"rcmd:  socket:  Cannot assign requested address"

Apr 17 23:56:39 2002 xc46-sc0 fomd[390]:  [8542 22273376277457 WARNING FOI2Net.cc 1592]
Propagation/retrieval of "/var/opt/SUNWSMS/adm/B/post/post020307.1211.49.log" failed -
"rcmd:  socket:  Cannot assign requested address"

Apr 17 23:56:40 2002 xc46-sc0 fomd[390]:  [8542 22274024132384 WARNING FOI2Net.cc 1592]
Propagation/retrieval of "/var/opt/SUNWSMS/adm/B/post/post020307.1208.58.log" failed -
"rcmd:  socket:  Cannot assign requested address"

Apr 17 23:56:40 2002 xc46-sc0 fomd[390]:  [8542 22274088989722 WARNING FOI2Net.cc 1592]
Propagation/retrieval of "/var/opt/SUNWSMS/adm/B/post/post020307.1222.42.log" failed -
"rcmd:  socket:  Cannot assign requested address"      

SOLUTION SUMMARY:
- Troubleshooting:

See messages in /var/opt/SUNWSMS/SMS1.2/adm/platform/messages file.


- Resolution:


These errors can be safely ignored. 
This problem is being addressed by SMS bug #4472333. 
File propagation still works correctly, but it takes longer (see Additional Background 
information at the end of the article).

- Summary of part number and patch ID's 

Patch will be available in near future.

- References and bug IDs

Bug # 4472333

- Additional background information:

When fomd propagates files, rsh uses reserved TCP ports for communicating with the 
other host. Closing a connection puts the port in TIME_WAIT state for a short time.

If you have a large number of files to be propagated, each TCP port will be in TIME_WAIT
at the end of each connection.  As most of the ports are in TIME_WAIT, the system runs out
of reserved ports very quickly.  bind() fails when the number of free reserved ports is less
than 1/2 of reserved ports.  This results in the error message: "socket:  Cannot assign
requested address".

The fomd at boot time tries to transfer a large number of files corresponding
to all domains using rcp and uses rsh to change the mode/permissions. 
This causes the problem at boot time for systems which have many files to
be synced.

A work around fix was integrated (bug 4472333) which optimizes the use of
rsh(1).  This does not eliminate the appearance of the errors, but file propagation
will complete more quickly than before.  A real fix should completely eliminate the
dependence on rsh(1).  However, since the rcmd TCP timeout in Solaris 9 is smaller, the
problem might automatically go away when/if the SC moves to Solaris[TM] 9.

- Meta-Data/Problem categorization:

Product/Platform: SF12K/SF15K
Category:

- Keywords

SMS daemon fomd 
            

INTERNAL SUMMARY:

SUBMITTER: Vasant Butala BUG REPORT ID: 4472333, 4472333 APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.