InfoDoc ID   Synopsis   Date
49202   Sun Fire[TM] 3800-6800: CPU/Memory Board Dynamic Reconfiguration (DR) Considerations   8 Jan 2003

Status Issued

Description

This document provides guidance for using dynamic reconfiguration (DR) on CPU/Memory boards in Sun Fire 3800-6800 systems. Using DR for configuration changes, servicing CPU/Memory boards increases the overall application uptime. The Solaris[TM] Operating Environment (OE) and applications keep running while DR tasks are performed.

TERMS:

When describing DR operations, this document uses terms as defined by the Solaris OE DR command cfgadm.

Configure Adding a CPU/Memory board to a running Domain.

Disconnect Removing a CPU/Memory board from running Domain.

ASSUMPTIONS:

It's assumed that the Domain and the Sun Fire System Controller (SSC) are running the minimum Solaris/Firmware versions required for supporting DR:

All loaded third party device drivers have to fulfill the Device Driver specifications (See Writing Device Drivers 805-7378 ). Please refer to the third party reference documentation or check the following web page for Sun certified third party cards:

http://www.sun.com/io_technologies/pci/pci.cards.cat.html

CONFIGURING A CPU/MEMORY BOARD INTO A RUNNING DOMAIN:

High level overview of Procedure:

  1. Match Firmware level of Domain and CPU/Memory board.
  2. Verify DR status of CPU/Memory board.
  3. Configure CPU/Memory into the running Domain.

Detailed Procedure:

  1. The Firmware level of the CPU/Memory board and the running Domain have to match. Verify the Firmware level with the showboards -p prom command on the SSC:
     sunfire-sc0:SC> showboards -p prom
        Component   Compatible Version                
        ---------   ---------- -------                
        SSC0        Reference  5.13.4                 
        SB0         Yes        5.13.4                 
        /N0/SB2     Yes        5.13.4                 
        /N0/IB6     Yes        5.13.4                               
        /N0/IB8     Yes        5.13.4                                                                            

    If the CPU/Memory board has a different Firmware level, use the flashupdate command on the SSC to change it appropriately.

                                                                                
  2. Use the cfgadm command of the Solaris OE to get the status of the DR components (attachment points). An available CPU/Memory Board would give a disconnected/unconfigured/unknown status, as SB2 in the CLI example below.
     #cfgadm
        Ap_Id       Type         Receptacle   Occupant     Condition
        N0.IB6      PCI_I/O_Boa  connected    configured   ok
        N0.SB0      CPU_Board_V  connected    configured   ok
        N0.SB2      CPU_Board_V  disconnected unconfigured unknown
        c0          scsi-bus     connected    configured   unknown                                                                            
  3. Use the cfgadm command to configure the CPU/Memory into the Domain.
     #cfgadm -o platform=diag=default -c configure N0.SB2                                                                            

DISCONNECTING A CPU/MEMORY BOARD FROM A RUNNING DOMAIN

High level overview of Procedure:

  1. Check if the CPU/Memory board contains permanent memory.
  2. Check DR requirements CPU/Memory boards.
  3. Disconnect CPU/Memory from the running Domain

Detailed Procedure:

  1. The requirements for disconnecting a CPU/Memory board are different for a board with permanent memory and without. Kernel memory and OBP are referred to as permanent memory. Use the cfgadm command to verify the location of permanent memory.
     #cfgadm -av | grep memory
        Ap_Id           Receptacle   Occupant     Condition  Information
        When            Type         Busy         Phys_Id
        N0.SB0::memory  connected    configured   ok         base address 0x0, 8388608 KBytes total,
        1529776 KBytes permanent
        Jul 26 13:30    memory       n            /devices/ssm@0,0:N0.SB1::memory
        N0.SB2::memory  connected    configured   ok         base address 0x400000000, 8388608 KBytes total
        Jul 26 13:30    memory       n            /devices/ssm@0,0:N0.SB2::memory                                                                            

    In the example above, CPU/Memory SB0 contains permanent memory.

                                                                                
  2. The system will automatically check if requirements are fulfilled and abort the DR operation if not. The requirements can be checked prior to performing the DR operation or on abort of a DR operation for verification. Requirements to perform a disconnect operation if the board does and does NOT contain permanent memory:

    If the requirements are fulfilled, proceed to step 3. If the CPU/Memory contains permanent memory, additional requirements have to be met. The system will automatically check for these conditions and the DR operation will abort if not met.

  3. If the requirements are checked and satisfied, initiate the disconnect operation with the cfgadm command:

         #cfgadm -c disconnect N0.SB0                                                          

    On disconnecting a CPU/Memory board, the following messages are logged in /var/adm/messages:

         Dec  9 09:12:43 domA genunix: /ssm@0,0/memory-controller@3,400000 (mc-us36) offline
            Dec  9 09:12:43 domA genunix: /ssm@0,0/memory-controller@2,400000 (mc-us35) offline
            Dec  9 09:12:43 domA genunix: /ssm@0,0/memory-controller@1,400000 (mc-us39) offline
            Dec  9 09:12:43 domA genunix: /ssm@0,0/memory-controller@0,400000 (mc-us34) offline                                                               

References:

Sun Fire 3800-6800 Servers Dynamic Reconfiguration Blueprint (816-4560)

Sun Fire 3800-6800 Systems Dynamic Reconfiguration Users Guide (806-6783)

Sun Cluster 3.0 Concepts (816-2027)

Writing Device Drivers (805-7378)

man page cfgadm, cfgadm_sbd, rcmscript

Keywords:

Sun Fire 3800-6800, Dynamic Reconfiguration, DR, best practices, permanent

INTERNAL SUMMARY:

Internal Summary

This is a living document. As features/requirement change, all attempts to keep this document current will be made. If while using its content, an oversight or discrepancy is noted, contact the submitter.

Internally the following URLs are most useful:

http://pts-americas.west.sun.com/esg/msg/techinfo/platform/sun_fire/

http://systems.corp.sun.com/tools/salestools/datacenter/avail/dr/index.html

SUBMITTER: Peter Gonscherowski BUG REPORT ID: 4618861 APPLIES TO: Hardware/Sun Fire /3800, Hardware/Sun Fire /4800, Hardware/Sun Fire /4810, Hardware/Sun Fire /6800 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.