SRDB ID   Synopsis   Date
47422   Sun Fire[TM] 12K/15K: hpost: Invalid MADR error during level 64 HPOST   18 Nov 2002

Status Issued

Description

As a part of the EIS procedure for Sun Fire 12K/15Ks, you must run a level 64 HPOST stress test. However, failures are occurring with the procedure, similar to the following:

#  /opt/SUNWSMS/bin/hpost -d A -pnone -l64 -v40 -Perr_print 64 -Pno_obp_handoff 

........ snip ......... 
Proc SB0/P3: Starting Phase 3: Cpu functional tests 
Logical Bank SB2/P0/B0/L1: init_mem_chain_from_prd_by_loc(): ERROR: 
Invalid MADR (00000000.00000000) with RSV 12 
ProFecacheR1_sc_tfunc(): INTERNAL: init_mem_chain_from_prd_by_loc(13, 
GM_LOC_SLT, &cpu_get_mem_chainp) failed 
Proc SB2/P1: stage_cpu_lpost_callback(): generic_stage_callback() failed 
FAIL Proc SB2/P1: call_sc_code_test(): INTERNAL: st_callback() failed for SC Test 
There is no FRU service action indicated for this failure. 
FAIL Proc SB2/P2: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 
There is no FRU service action indicated for this failure. 
FAIL Proc SB2/P3: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 
There is no FRU service action indicated for this failure. 
Proc SB9/P0: Starting Phase 3: Cpu functional tests 
Proc SB9/P1: Starting Phase 3: Cpu functional tests 
Proc SB9/P2: Starting Phase 3: Cpu functional tests 
Proc SB9/P3: Starting Phase 3: Cpu functional tests 
Logical Bank SB10/P0/B1/L1: init_mem_chain_from_prd_by_loc(): ERROR: 
Invalid MADR (00000000.00000000) with RSV 12 
ProFecacheR1_sc_tfunc(): INTERNAL: init_mem_chain_from_prd_by_loc(61, 
GM_LOC_SLT, &cpu_get_mem_chainp) failed 
Proc SB10/P1: stage_cpu_lpost_callback(): generic_stage_callback() failed 
FAIL Proc SB10/P1: call_sc_code_test(): INTERNAL: st_callback() failed for SC Test 
There is no FRU service action indicated for this failure. 
FAIL Proc SB10/P2: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 
There is no FRU service action indicated for this failure. 
FAIL Proc SB10/P3: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 
There is no FRU service action indicated for this failure. 
Proc SB1/P1: Host Test 31 Ecache Functional 
Proc SB1/P2: Host Test 31 Ecache Functional 
Proc SB1/P3: Host Test 31 Ecache Functional 
FAIL Proc SB2/P0: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 
There is no FRU service action indicated for this failure. 
Proc SB4/P1: Host Test 31 Ecache Functional 
Proc SB4/P2: Host Test 31 Ecache Functional 
Proc SB4/P3: Host Test 31 Ecache Functional 
Proc SB7/P1: Host Test 31 Ecache Functional 
Proc SB7/P2: Host Test 31 Ecache Functional 
Proc SB7/P3: Host Test 31 Ecache Functional 
FAIL Proc SB10/P0: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 
There is no FRU service action indicated for this failure. 
Proc SB11/P1: Host Test 31 Ecache Functional 
........snip........                   

The HPOST run ends with:

Skipping OBP handoff as requested 

CPU_Brds:  Proc  Mem P/B: 3/1 3/0  2/1 2/0  1/1 1/0  0/1 0/0 
Slot  Gen  3210        /L: 10  10   10  10   10  10   10  10     CDC 
SB00:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB01:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB02:  c   ffff            cc  cc   cc  cc   cc  cc   cc  cc      P 
SB03:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB04:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB05:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB06:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB07:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB08:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB09:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB10:  c   ffff            cc  cc   cc  cc   cc  cc   cc  cc      P 
SB11:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB12:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB13:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB14:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P 
SB15:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P                   

SOLUTION SUMMARY:

Explanation:

First, Logical Bank SB2/P0/B0/L1 does not exist. Secondly, all procs are now failed out on the system boards, yet there is no indication of the specific component which is the implicated FRU. So, is there really bad hardware?

If you retest the domain using the exact same HPOST command as before, everything runs clean, with no failures. If you continue to run the same HPOST test, each time it runs clean.

Action:

Run the same HPOST over again several times to confirm the sanity of the hardware. The results of this particular failure of HPOST reflect one type of failure that may occur with this type of HPOST run. There may be many different types of failures which can occur. The key to note is that if the FAIL statements in the post log do not reflect an actual component as an implicated FRU, then most likely this HPOST run is suspect. Rerun the HPOST's until you get one which has valid information. Updates to this information will be documented as it is continuously being worked by PDE (Bug ID 4655565).

Patch ID 112829-01 has been released to fix this bug.

INTERNAL SUMMARY:

SUBMITTER: Joshua Freeman BUG REPORT ID: 4655565 PATCH ID: 112829-01 APPLIES TO: AFO Vertical Team Docs/HAS, Hardware/Sun Fire /15000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.