SRDB ID | Synopsis | Date | ||
47422 | Sun Fire[TM] 12K/15K: hpost: Invalid MADR error during level 64 HPOST | 18 Nov 2002 |
Status | Issued |
Description |
As a part of the EIS procedure for Sun Fire 12K/15Ks, you must run a level 64 HPOST stress test. However, failures are occurring with the procedure, similar to the following:
# /opt/SUNWSMS/bin/hpost -d A -pnone -l64 -v40 -Perr_print 64 -Pno_obp_handoff ........ snip ......... Proc SB0/P3: Starting Phase 3: Cpu functional tests Logical Bank SB2/P0/B0/L1: init_mem_chain_from_prd_by_loc(): ERROR: Invalid MADR (00000000.00000000) with RSV 12 ProFecacheR1_sc_tfunc(): INTERNAL: init_mem_chain_from_prd_by_loc(13, GM_LOC_SLT, &cpu_get_mem_chainp) failed Proc SB2/P1: stage_cpu_lpost_callback(): generic_stage_callback() failed FAIL Proc SB2/P1: call_sc_code_test(): INTERNAL: st_callback() failed for SC Test There is no FRU service action indicated for this failure. FAIL Proc SB2/P2: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 There is no FRU service action indicated for this failure. FAIL Proc SB2/P3: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 There is no FRU service action indicated for this failure. Proc SB9/P0: Starting Phase 3: Cpu functional tests Proc SB9/P1: Starting Phase 3: Cpu functional tests Proc SB9/P2: Starting Phase 3: Cpu functional tests Proc SB9/P3: Starting Phase 3: Cpu functional tests Logical Bank SB10/P0/B1/L1: init_mem_chain_from_prd_by_loc(): ERROR: Invalid MADR (00000000.00000000) with RSV 12 ProFecacheR1_sc_tfunc(): INTERNAL: init_mem_chain_from_prd_by_loc(61, GM_LOC_SLT, &cpu_get_mem_chainp) failed Proc SB10/P1: stage_cpu_lpost_callback(): generic_stage_callback() failed FAIL Proc SB10/P1: call_sc_code_test(): INTERNAL: st_callback() failed for SC Test There is no FRU service action indicated for this failure. FAIL Proc SB10/P2: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 There is no FRU service action indicated for this failure. FAIL Proc SB10/P3: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 There is no FRU service action indicated for this failure. Proc SB1/P1: Host Test 31 Ecache Functional Proc SB1/P2: Host Test 31 Ecache Functional Proc SB1/P3: Host Test 31 Ecache Functional FAIL Proc SB2/P0: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 There is no FRU service action indicated for this failure. Proc SB4/P1: Host Test 31 Ecache Functional Proc SB4/P2: Host Test 31 Ecache Functional Proc SB4/P3: Host Test 31 Ecache Functional Proc SB7/P1: Host Test 31 Ecache Functional Proc SB7/P2: Host Test 31 Ecache Functional Proc SB7/P3: Host Test 31 Ecache Functional FAIL Proc SB10/P0: ProFecacheR1_sc_tfunc(): INTERNAL: Bad procpfp_test_cntl.status = 0 There is no FRU service action indicated for this failure. Proc SB11/P1: Host Test 31 Ecache Functional ........snip........
The HPOST run ends with:
Skipping OBP handoff as requested CPU_Brds: Proc Mem P/B: 3/1 3/0 2/1 2/0 1/1 1/0 0/1 0/0 Slot Gen 3210 /L: 10 10 10 10 10 10 10 10 CDC SB00: P PPPP PP PP PP PP PP PP PP PP P SB01: P PPPP PP PP PP PP PP PP PP PP P SB02: c ffff cc cc cc cc cc cc cc cc P SB03: P PPPP PP PP PP PP PP PP PP PP P SB04: P PPPP PP PP PP PP PP PP PP PP P SB05: P PPPP PP PP PP PP PP PP PP PP P SB06: P PPPP PP PP PP PP PP PP PP PP P SB07: P PPPP PP PP PP PP PP PP PP PP P SB08: P PPPP PP PP PP PP PP PP PP PP P SB09: P PPPP PP PP PP PP PP PP PP PP P SB10: c ffff cc cc cc cc cc cc cc cc P SB11: P PPPP PP PP PP PP PP PP PP PP P SB12: P PPPP PP PP PP PP PP PP PP PP P SB13: P PPPP PP PP PP PP PP PP PP PP P SB14: P PPPP PP PP PP PP PP PP PP PP P SB15: P PPPP PP PP PP PP PP PP PP PP P
SOLUTION SUMMARY:
Explanation:
First, Logical Bank SB2/P0/B0/L1 does not exist. Secondly, all procs are now failed out on the system boards, yet there is no indication of the specific component which is the implicated FRU. So, is there really bad hardware?
If you retest the domain using the exact same HPOST command as before, everything runs clean, with no failures. If you continue to run the same HPOST test, each time it runs clean.
Action:
Run the same HPOST over again several times to confirm the sanity of the hardware. The results of this particular failure of HPOST reflect one type of failure that may occur with this type of HPOST run. There may be many different types of failures which can occur. The key to note is that if the FAIL statements in the post log do not reflect an actual component as an implicated FRU, then most likely this HPOST run is suspect. Rerun the HPOST's until you get one which has valid information. Updates to this information will be documented as it is continuously being worked by PDE (Bug ID
Patch ID
INTERNAL SUMMARY:
SUBMITTER: Joshua Freeman BUG REPORT ID: 4655565 PATCH ID: 112829-01 APPLIES TO: AFO Vertical Team Docs/HAS, Hardware/Sun Fire /15000 ATTACHMENTS: