Document fins/I0849-1


FIN #: I0849-1

SYNOPSIS: New capability is available on E10000 systems to identify MSRAM
          modules from POST output

DATE: Oct/28/02

KEYWORDS: New capability is available on E10000 systems to identify MSRAM
          modules from POST output


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: New capability is available on E10000 systems to identify 
          MSRAM modules from POST output.


Sun Alert:          No

TOP FIN/FCO REPORT: No

PRODUCT_REFERENCE:  Mirrored SRAM CPU modules

PRODUCT CATEGORY:   Server / Service


PRODUCTS AFFECTED:  

Systems Affected:
-----------------  
Mkt_ID    Platform   Model   Description              Serial Number
------    --------   -----   -----------              -------------
  -        E10000     ALL    Ultra Enterprise 10000         -


X-Options Affected:
-------------------
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------
  -         -         -          -              -


PART NUMBERS AFFECTED: 

Part Number   Description   Model
-----------   -----------   -----
     -             -          -


REFERENCES:

BugId: 4401066 - Need to identify mirrored SRAM CPU modules after 
                 bringup/DR.
       4419788 - MSRAM processor property for Starfire.

PatchId: ssp3.3	108885: SSP 3.3: Modify POST/SSP to support CIC2 asic and 
                           new ecache SRAM.
                109661: SSP 3.3: DR attach drops cpu(s) to OBP when domain 
                           has heavy cpu load.
	 ssp3.4	110304: SSP 3.4: Updates for hpost, redx, and autoconfig.
                110316: SSP 3.4: Updates for hpost, redx, and autoconfig.
         ssp3.5 110498: SSP 3.5: Need to identify mirrored SRAM cpu modules 
                           after bringup/DR.
 
     
PROBLEM DESCRIPTION:

Mirrored SRAM CPU modules (IBM Sombra and Sony Espejo) are being
installed as upgrades in existing E10000 systems.  In some cases, these
are installed alongside older CPU modules.  The Field needs to be able
to determine if a Mirrored SRAM (MSRAM) module is installed in a
particular CPU location in order to more effectively service faulty CPU
modules.  This issue does not directly impact customer systems but does
affect serviceability.

RFE 4401066 requested that field personnel be given a way to probe the
E10000 and decipher which CPU modules have MSRAM.  New HPOST patches
for SSP3.4 software provide this capability.  A patch for SSP3.5 is
also available.  It is expected that these patches will enable field 
personnel to more easily diagnose and resolve CPU Ecache issues on 
E10000 systems.  

           SSP3.3  108885
           SSP3.4  110304
           SSP3.5  110498
                      
Without these recommended patches, there is a high risk that an
incorrect CPU module could be sent to a customer site or that
inappropriate Best Practices actions could take place, i.e., no
replacement on first Ecache parity error, even though the CPU in error
has Mirrored SRAM.  The Best Practice for MSRAM CPU modules is to
replace them on the first error and submit them for RCCA/CPAS.  

Once these patches are installed, POST output will change as follows:

1)  In phase proc1, POST will try to acquire the Module Capability (MCAP) 
    value from the UPA_CONFIG register of the CPU.  (This value was 
    previously unused for Sunfire E6000 and Starfire E10000 processor 
    modules).  For currently shipped processor modules, the MCAP bits have 
    now been hard wired to signify the following assignments:

MCAP    MCAP   MIRRORED ECACHE?
BINARY  HEX  DATA SRAMS  TAG SRAMS  COMMENTS ON PROCESSOR MODULE & ECACHE
==============================================================================
b'0000  0x0  unknown     unknown    Cannot determine anything with MCAP = 0
b'0001  0x1  YES         YES        "Blaze 466Mhz IBM"  = 501-5816-09
b'0010  0x2  YES         YES        "Blaze 466Mhz Sony" = 501-5798-09
b'0011  0x3  YES         YES        "Blaze 400Mhz IBM"  = 501-5814-03
b'0100  0x4  YES         YES        "Blaze 400Mhz Sony" = 501-5815-03
==============================================================================

In addition, bits 4, 5, and 6 of the post2obp processor auxiliary, the 
fields are now used as follows:
      
      6:  Ecache TAG SRAM is mirrored  (1 = YES, 0 = NO)
      5:  Ecache DATA SRAMs are mirrored  (1 = YES, 0 = NO)
      4:  Ecache SRAMs mirrored info is valid (1 = YES, 0 = NO)
      
After obtaining the MCAP value in phase proc1m the MCAP value will be
checked.  For older legacy processor modules, the value will be "0",
indicating that the type of ecache is unknown in regards to mirrored
or not.  If the MCAP value is non-zero, POST will check to see if it is
a known value.  If the non-zero MCAP value is unknown, then WARN on
that unknown non-zero value and message that nothing could be
determined by it, but do not FAIL the proc.   

An example unknown MCAP value WARNING message will appear as follows:

WARNING: Proc 0.3: Unknown MCAP value: 0x5
        Cannot check post2obp proc aux info with unknown MCAP value.

   In the above case, the mirrored status information was detected to
   be valid because it was written in during phase jtag_integ(see item 
   #4).  But since the MCAP value is unknown, the mirrored status information 
   could not be cross checked.  The WARNING will be issued, but HPOST will
   continue without failing the processor with the unknown MCAP value.

   Another example WARNING is as follows:

WARNING: Proc 0.3: Unknown MCAP value: 0x5
        Cannot set post2obp proc aux info with unknown MCAP value.

   In this case, the mirrored status information in the post2obp processor
   auxiliary structure was not valid, perhaps because phase jtag_integrity
   (see item #4) was skipped.  A subsequent detection of an unknown MCAP 
   value results in a message that the post2obp processor auxiliary 
   information could not be SET based solely on the unknown MCAP value.  
   Again, HPOST will continue with only the WARNING message and the proc 
   will not be failed.
        
2) A new postrc directive has been added for extra messaging during the 
   ecache SRAM probe:

   A user could make the following entry in the postrc to allow more
   messaging during HPOST, regarding probing of the ecache SRAMs for
   their mirrored status:

   debug_maskx00001000  # Extra messaging proc ecache SRAM mirrored status.

   Note that some of the new messaging requires HPOST verbosity to be
   at level 120 as well as having the new postrc entry above.        
 
3) The postyymmdd.time.log file format has changed.  The detected processor 
   ecache SRAM status is now printed at the end of the POST log file.  
   
   At the end of every POST log file, the post2obp auxiliary info structure
   will now be included, with each new line beginning with: "#E".  The
   following is a real example of the new post2obp information being included
   at the end of the new POST log file:
   
   --------------------EXAMPLE postyymmdd.time.log: 
START------------------------
   <...snip...>
   phase final_config: Final configuration...
   Configuring in 3F, FOM = 92160.00: 10 procs, 8 Scards, 9216 MBytes.
   Creating OBP handoff structures...
   Configured in 3F with 10 processors, 8 Scards, 9216 MBytes memory.
   Interconnect       frequency is  83.241 MHz, from SNMP MIB.
   Processor external frequency is 124.878 MHz, from SNMP MIB.
   Processor internal frequency is 249.724 MHz, from proc clk_mode probe.
   NOTE: 2 processors were detected running at least 9.00% below rated speed.
         Check system clock values/ratios using the SSP command sys_clock
   Boot processor is 3.0 = 12
   POST (level=16, verbose=20) execution time 6:53
   #E Auxiliary Info structures:
   #E brd: cpu3 cpu2 cpu1 cpu0 MCAP  ioc1 ioc0  iom type
   #E  3:  0013 0013 0013 0013 0000  0000 0000  01: 2 * (SYSIO w/ 2 SBus slots)
   #E  4:  0013 0013 0013 0013 0000  0000 0000  01: 2 * (SYSIO w/ 2 SBus slots)
   #E  5:  0074 0074 0004 0004 11    0000 0000  01: 2 * (SYSIO w/ 2 SBus slots)
   # SMI E10000 POST log closed Wed Mar 20 06:58:46 2002
   --------------------EXAMPLE postyymmdd.time.log: 
END--------------------------
      
   Breaking down an example line;
   
   #E brd: cpu3 cpu2 cpu1 cpu0 MCAP  ioc1 ioc0  iom type
   #E  5:  0074 0074 0004 0004 11    0000 0000  01: 2 * (SYSIO w/ 2 SBus slots)
  		
   CODE       XXAB XXCD XXEF XXGH IJKL  XXXX XXXX
  
   (NOTE: "CODE" is just for FIN explanation.  It won't be in the POST
log)

   CODE KEY:

     A.  For brd-5, cpu3, we have a "7" in that field (all 3 bits set),
	 indicating the processor module has mirrored data and mirrored
	 tag SRAMs, and that information is "valid".
	 
     B.  Ecache Setting, outside the scope of this FIN
     
     C.  For brd-5, cpu2, we have a "7" in that field (bits [4,5,6] are
set),
	 indicating the processor module has mirrored data and mirrored
	 tag SRAMs, and that information is "valid".
	 
     D.  Ecache Setting, outside the scope of this FIN		     	

     E.  For brd-5, cpu3, we have a 0" in that field (0 bits set).  See
     	 note for K
	 
     F.  Ecache Setting, outside the scope of this FIN
     
     G.  For brd-5, cpu3, we have a 0" in that field (0 bits set).  See
     	 note for L
	 
     H.  Ecache Setting, outside the scope of this FIN
     
     I.  MCAP value = "1"; Proc identified as a 466mhz "Blaze"
with IBM 
         (sombra) MSRAMs.  (see table above)
     
     J.  MCAP value = "1"; Proc identified as a 466mhz "Blaze"
with IBM 
         (sombra) MSRAMs.  (see table above)
     
     K.  MCAP is blank.  The proc is not present or the phase jtag_integ was 
         skipped AND the proc was FAILED before its MCAP value could be  
        dechipered in phase proc1.
     
     L.  MCAP is blank.  The proc is not present or the phase jtag_integ was 
         skipped AND the proc was FAILED before its MCAP value could be 
         deciphered in phase proc1.
		 	     			     	  
  So for system board three, we can't identify which type of procs they
  are, but they are not mirrored SRAMs or mirrored TAGs, as the data is
  valid.
  
Some other items to note, necessary for the explanation above, but do not 
directly affect the field:
   
4) A result of this patch, phase jtag_integ now:
      
   . Checks the JTAG (scantool) database on the SSP, to see if special
     MSRAM handling is required for both the ecache data AND now tag SRAMs.

   . If the JTAG (scantool) database on the SSP doesn't show that special
     MSRAM handling is required for a given processor, cautiously do an
     electronic JTAG probe of that processor's ecache SRAM's Component IDs
     to be sure.  This is just in case the database was incorrect because
     autoconfig was never run for a given system board, and that system
     board may have processor modules that have MSRAMs that require special
     handling.  If so, fail the system board and instruct user to run 
     autoconfig.

   . Checks all other processor e-cache tag and data SRAMs in the domain, to
     see if they are mirrored or not.

   . Records the ecache data and ecache tag status for each processor, into
     the post2obp auxiliary info structure, and mark the info as
"valid".
      
5) For hpost -Q arg: Make sure that extracted post2obp mirrored status
   information is not misleading statement.            


IMPLEMENTATION: 

         ---
        |   |   MANDATORY (Fully Proactive)
         ---    
         
  
         ---
        |   |   CONTROLLED PROACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        | X |   REACTIVE (As Required)
         ---


CORRECTIVE ACTION:

The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.

Install the following patches on E10000 systems for the complete solution:

   For SSP3.3: Patches 109661 or later, or 108885 or later
   For SSP3.4: Patches 110316 or later, or 110304 or later
   For SSP3.5: Patch 110498  
   

COMMENTS:

None  

============================================================================

Implementation Footnote:

i)   In case of MANDATORY FINs, Sun Services will attempt to contact 
     all affected customers to recommend implementation of the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Sun Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Sun Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sun.com
--------------------------------------------------------------------------


Copyright (c) 1997-2003 Sun Microsystems, Inc.