SRDB ID   Synopsis   Date
48834   Sun Fire[TM] 3800-6800: Troubleshooting NCPQ_TO errors   9 Dec 2002

Status Issued

Description

Problem Statement:

This document aids in troubleshooting Non Cacheable Pending Queue Time Outs (NCPQ_TO) on Sun Fire 6800-3800 systems. NCPQ_TOs occur when data requests in Non Cacheable address space do not complete a transaction. Non Cacheable addresses space is Safari Device config and I/O address space.

Symptoms:

Error messages indicating a NCPQ_TO occurred are seen on the Domain Console. The error messages are also stored in the Domain Console Buffer and can be retrieved by the Sun Fire System Controller (SSC) command showlogs. If a loghost is configured, the error messages are stored on the loghost. NCPQ_TOs can occur during normal operation of the Domain or during POST. Here an example log of a NCPQ_TO error:

Feb 26 10:46:02 sq1sc Domain-C.SC: ErrorMonitor:Domain C has a SYSTEM ERROR
Feb 26 10:46:02 sq1sc Domain-C.SC: /N0/SB1 encountered the first error
Feb 26 10:46:02 sq1sc Domain-C.SC: RepeaterSbbcAsic reported first error on /N0/SB1
Feb 26 10:46:02 sq1sc Domain-C.SC: /partition1/domain0/SB1/bbcGroup0/sbbc0: 
                      FE [15:15] : 0x1 
                  ErrSum [31:31] : 0x1 
                  SafErr [09:08] : 0x1 Fireplane device asserted an error
Feb 26 12:20:47 SunFireSc0 Domain-C.SC: /partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0: 
    AFAR (high)[0x531] : 0x0000063c
            AFAR [42:32] [10:00] : 0x63c 
    AFAR (low)[0x541] : 0xff800000
    AFAR_2 (high)[0x571] : 0x0000063c
          AFAR_2 [42:32] [10:00] : 0x63c 
    AFAR_2 (low)[0x581] : 0xff800000
    AFSR (high)[0x551] : 0x00080000         
         PERR [19:19] : 0x1 
    AFSR_2 (high)[0x591] : 0x00080000
                    PERR [19:19] : 0x1 
    EMU B[0x511] : 0x03000000
                  AID_LK [24:24] : 0x1 
                 NCPQ_TO [25:25] : 0x1                         

Interpretation:

A System error is detected and Domain C is PAUSED. From the device path in the error messages it can be determined that the error is detected on SB1 CPU A.

/partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0

The Error Type is an NCPQ_TO. Using the Address Space Assignment in InfoDoc:49293, the AFAR_2 0x0000063c.ff800000 decodes to :

Non Cacheable Schizo Device Pair Agent ID 1E Leaf B. (I/O Boat 9 Slots 0,1,2 )

Possible Causes:

There are many possible hardware and software root causes for NCPQ_TOs. They can be caused by faulty CPUs, I/O Bridge ASICs (Schizo), PCI cards as well as Bugs in the Microcode of cPCI/PCI cards. The following scenarios have been known to cause NCPQ_TOs on Sun Fire 3800-6800 systems:

SOLUTION SUMMARY:

Troubleshooting:

In general the device indicated by the AFAR_2 is likely to be the cause for the NCPQ_TO. However the device reporting the error can as well be the cause. If an NCPQ_TO occures the following steps should be taken to isolate the suspect FRU:

Run POST with a diag level set to default or higher.

References and bug IDs:

27432 Sun Fire (3800-6800): Physical Device Mapping for I/O boats

49293 Address Space Assignment

Keyword:

Sun Fire 6800,Sun Fire 4800, Sun Fire 3800, NCPQ_TO

INTERNAL SUMMARY: SUBMITTER: Peter Gonscherowski BUG REPORT ID: 4593208 APPLIES TO: Hardware/Sun Fire /3800, Hardware/Sun Fire /4800, Hardware/Sun Fire /4810, Hardware/Sun Fire /6800 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.