SRDB ID   Synopsis   Date
47289   Sun Fire[TM] 12K/15K: Uncorrectable system bus (UE) Event on CPU### Privileged Data Access at TL=0   23 Sep 2002

Status Issued

Description

The following message appears in /var/adm/messages on a Sun Fire[TM] 12K/15K domain, in the console messages, or in the core file's message buffer. What does this mean?

WARNING: [AFT1] Uncorrectable system bus (UE) Event on CPU0 Privileged Data Access at TL=0, errID 0x00000022.bc2ed1a8

    AFSR 0x00100004<PRIV,UE>.0000000a AFAR 0x00000000.1e1a1220 
    Fault_PC 0x100085d0 Esynd 0x000a J0100 J0202 J0304 J0406 
[AFT1] errID 0x00000022.bc2ed1a8 Two Bits were in error 
[AFT2] errID 0x00000022.bc2ed1a8 PA=0x00000000.1e1a1200 
    E$tag 0x00000000.3c000002 E$state_0 Exclusive 
[AFT2] E$Data (0x00) 0x0eccfeed.7804d200 0x0eccfeed.7804d200 ECC 0x16d 
[AFT2] E$Data (0x10) 0x0eccfeed.7804d200 0x0eccfeed.7804d200 ECC 0x16d 
[AFT2] E$Data (0x20) 0x0eccfeed.7804d20c 0x0eccfeed.7804d200 ECC 0x16d *Bad* Esynd=0x00a 
[AFT2] E$Data (0x30) 0x0eccfeed.7804d200 0x0eccfeed.7804d200 ECC 0x16d 
[AFT2] D$ data not available                                                 

***NOTE*** The above example is from a Sun Fire system, not specifically a 12K/15K. The error string will be the same as on a 12K/15K, but the implicated dimms will be different numbers. For the purpose of this document, these differences are irrelevant.

SOLUTION SUMMARY:

Explanation:

A UE event is an uncorrectable system bus data ECC error for read from the system bus.

The second [AFT1] message indicates that a multibit error was detected in the data coming in from the system bus. Data coming in from the system bus which contains a multi-bit error causes a UE. This error will normally result in a panic. The proc implicated in the message or the corresponding panic may not be the source of the error -- it could be the victim.

Action:

Confirm that this error is not occurring as the result of a WDU or CPU event. This can be easily confirmed by the indication of this being a "special syndrome of 0x071". If not, treat this error as a hard failure of the implicated processor. Corresponding rstop or dstops should be examined to determine if the fault lies in the cpu implicated by the message or in the memory specified.

This error could indicate the replacement of memory or the SB (in the event that the proc is confirmed bad, the SB is the FRU not the proc).

INTERNAL SUMMARY:

SUBMITTER: Joshua Freeman APPLIES TO: AFO Vertical Team Docs/HAS, Hardware/Sun Fire /15000 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.