SRDB ID   Synopsis   Date
47560   SE99x0 ENVIRONMENTAL ERROR SIM identifies multiple hardware failures; Cause may be bent pin   24 Dec 2002

Status Issued

Description

Generally when a hardware component in an SE99x0 fails, a SIM is generated which references the bad component[s] most likely to be rootcause. The reference codes and action codes associated with the SIM have a course of action to replace the suspected hardware. However, in some isolated cases the SIM[s] may reference many different hardware components which are not necessarily failing. It is after replacing a few of the SIM[s] recommended components without resolution, that one needs to consider what the components may have in common. Note:for multiple HDD failures on the same fibre loop, the TSE should arrange for a subsystem dump to be done for that SE99x0 and have it analyzed to best identify the faulty component.

A situation was encountered in Radiance Case #63108314 on an SE9960 where the SIM[s] appeared as such:

CASE_START
CASE_SUMMARY: Hi-Track 9900 SIM
CASE_DESCRIPTION: 
<ht_mail_id_2002072909023934.13487> Error information follows:

_System Type: 9960
_Site ID: X####57
_System S/N: ###69
_Microcode: DKCMAIN=01-17-94-00/00

The following SIMs have been transferred from this site by Hi-Track: 

------- SIM 01 Follows: ----------------------------------------------------
Severity: Moderate  SIM Time: Jul 29, 2002 11:52:55  SIM S/N: 64093
SIM Type: DKC

Reference Code: BF4A56  
  Type: ENVIRONMENTAL ERROR               
  Description: HDD MPS0 WARNING


Action Codes: 
  Code: 10509600
  Location: MPS-L260              
  Function: Multi PS(HDD)                                           
  Additional: Multiple PS for HDU-L26

  Code: 10519000
  Location: AC BOX-L20            
  Function: AC BOX                                                  
  Additional: AC Box in Fourth Disk Unit(DKU-L2)

  Code: 10C19000
  Location: DKUMN-L2F             
  Function: DKUMN PCB                                               
  Additional: DKUMN PCB in front of Fourth Disk Unit(DKU-L2)

  Code: 10D10000
  Location: SSVP/HUB              
  Function: SSVP/HUB PCB                                            
  Additional: SSVP PCB(ASSIST board)

SIM Bytes: 
  Byte: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 
  Data: 00 90 10 00 00 00 8F E0 11 40 00 80 F0 56 22 0C 

  Byte: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
  Data: 22 00 A0 09 00 04 BF 4A 01 10 00 00 F1 00 04 00 

----------------------------------------------------------------------------

------- SIM 02 Follows: ----------------------------------------------------
Severity: Moderate  SIM Time: Jul 29, 2002 11:52:55  SIM S/N: 64095
SIM Type: DKC

Reference Code: BF4B56  
  Type: ENVIRONMENTAL ERROR               
  Description: HDD MPS1 WARNING


Action Codes: 
  Code: 10509610
  Location: MPS-L261              
  Function: Multi PS(HDD)                                           
  Additional: Multiple PS for HDU-L26 (Option)

  Code: 10519010
  Location: AC BOX-L21            
  Function: AC BOX                                                  
  Additional: AC Box in Fourth Disk Unit(DKU-L2)

  Code: 10C19010
  Location: DKUMN-L2R             
  Function: DKUMN PCB                                               
  Additional: DKUMN PCB in front of Fourth Disk Unit(DKU-L2)

  Code: 10D10000
  Location: SSVP/HUB              
  Function: SSVP/HUB PCB                                            
  Additional: SSVP PCB(ASSIST board)

SIM Bytes: 
  Byte: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 
  Data: 00 90 10 00 00 00 8F E0 11 40 00 80 F0 56 24 0C 

  Byte: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
  Data: 22 00 A0 09 00 04 BF 4B 01 10 00 00 F1 00 04 00 

----------------------------------------------------------------------------

CALL_TYPE: Hi-Track Reported
SEVERITY: 3-Minor Restriction
SITE_ID: X####57
CONTACT_LAST_NAME: hitrack
CONTACT_FIRST_NAME: hitrack
CONTACT_PHONE: 1-800-348-help
CASE_END                               

SOLUTION SUMMARY:

In this particular situation it seems that there are several failing hardware components. The action codes detail the components in question and, when looked up in the hardware maintenance manual, reference a replacement procedure.

Suggested course of action is to systematically replace the hardware referenced in the SIMs [first to last]. The SIMs may be generated by one of the components listed. In this case we replaced the both Multiple Power Supplies in HDU 6 of the Left 2 DKA (MPS-L260 and MPS-L261). The problem persisted so we replaced AC BOX-L20 and AC BOX-L2. Still, the problem was not remedied. Replacement of the next hardware component, the PCB in DKUMN-L2R and DKUMN-L2F, still did not fix the errors. Replacing the SSVP also did not solve the problem. So one would logically start to look for a hardware component elsewhere, which is not listed specifically in the SIM messages.

If replacing the components listed in the SIM does not fix the problem, then the culprit is something which these components have in common. The rootcause of our ENVIRONMENTAL SIM[s] was a bent pin on the HDU backplane that effected the Multiple Power Supplies which power the HDU. Bent pin[s] as we learned, are picked up by the Subsystem, but ambiguously identified in ENVIRONMENTAL SIM[s]. It may take some time to go through the procedure of a "mock disk replacement" in which case you make the array think you are replacing the drive but in effect you are simply pulling the drive, inspecting the pins on the backplane and re-inserting the same drive. If bent pins are found, the backplane should be replaced.

INTERNAL SUMMARY:

SUBMITTER: Glenn Thoren APPLIES TO: AFO Vertical Team Docs/Storage ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.