SRDB ID | Synopsis | Date | ||
48834 | Sun Fire[TM] 3800-6800: Troubleshooting NCPQ_TO errors | 9 Dec 2002 |
Status | Issued |
Description |
Problem Statement:
This document aids in troubleshooting Non Cacheable Pending Queue Time Outs (NCPQ_TO) on Sun Fire 6800-3800 systems. NCPQ_TOs occur when data requests in Non Cacheable address space do not complete a transaction. Non Cacheable addresses space is Safari Device config and I/O address space.
Symptoms:
Error messages indicating a NCPQ_TO occurred are seen on the Domain Console. The error messages are also stored in the Domain Console Buffer and can be retrieved by the Sun Fire System Controller (SSC) command showlogs. If a loghost is configured, the error messages are stored on the loghost. NCPQ_TOs can occur during normal operation of the Domain or during POST. Here an example log of a NCPQ_TO error:
Feb 26 10:46:02 sq1sc Domain-C.SC: ErrorMonitor:Domain C has a SYSTEM ERROR Feb 26 10:46:02 sq1sc Domain-C.SC: /N0/SB1 encountered the first error Feb 26 10:46:02 sq1sc Domain-C.SC: RepeaterSbbcAsic reported first error on /N0/SB1 Feb 26 10:46:02 sq1sc Domain-C.SC: /partition1/domain0/SB1/bbcGroup0/sbbc0: FE [15:15] : 0x1 ErrSum [31:31] : 0x1 SafErr [09:08] : 0x1 Fireplane device asserted an error Feb 26 12:20:47 SunFireSc0 Domain-C.SC: /partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0: AFAR (high)[0x531] : 0x0000063c AFAR [42:32] [10:00] : 0x63c AFAR (low)[0x541] : 0xff800000 AFAR_2 (high)[0x571] : 0x0000063c AFAR_2 [42:32] [10:00] : 0x63c AFAR_2 (low)[0x581] : 0xff800000 AFSR (high)[0x551] : 0x00080000 PERR [19:19] : 0x1 AFSR_2 (high)[0x591] : 0x00080000 PERR [19:19] : 0x1 EMU B[0x511] : 0x03000000 AID_LK [24:24] : 0x1 NCPQ_TO [25:25] : 0x1
Interpretation:
A System error is detected and Domain C is PAUSED. From the device path in the error messages it can be determined that the error is detected on SB1 CPU A.
/partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0
The Error Type is an NCPQ_TO. Using the Address Space Assignment in InfoDoc:
Non Cacheable Schizo Device Pair Agent ID 1E Leaf B. (I/O Boat 9 Slots 0,1,2 )
Possible Causes:
There are many possible hardware and software root causes for NCPQ_TOs. They can be caused by faulty CPUs, I/O Bridge ASICs (Schizo), PCI cards as well as Bugs in the Microcode of cPCI/PCI cards. The following scenarios have been known to cause NCPQ_TOs on Sun Fire 3800-6800 systems:
Troubleshooting:
In general the device indicated by the AFAR_2 is likely to be the cause for the NCPQ_TO. However the device reporting the error can as well be the cause. If an NCPQ_TO occures the following steps should be taken to isolate the suspect FRU:
Run POST with a diag level set to default or higher.
0x00000400.0a400010 -> Safari Agent ID 14(hex), CPU0 on CPU/Memory board 5.
0x00000402.61000380 -> Safari Agent 18(hex) Schizo 0 Leaf B, on I/O Boat 6 P0 B1.
References and bug IDs:
Keyword:
Sun Fire 6800,Sun Fire 4800, Sun Fire 3800, NCPQ_TO
INTERNAL SUMMARY: