SRDB ID | Synopsis | Date | ||
47560 | SE99x0 ENVIRONMENTAL ERROR SIM identifies multiple hardware failures; Cause may be bent pin | 24 Dec 2002 |
Status | Issued |
Description |
Generally when a hardware component in an SE99x0 fails, a SIM is generated which references the bad component[s] most likely to be rootcause. The reference codes and action codes associated with the SIM have a course of action to replace the suspected hardware. However, in some isolated cases the SIM[s] may reference many different hardware components which are not necessarily failing. It is after replacing a few of the SIM[s] recommended components without resolution, that one needs to consider what the components may have in common. Note:for multiple HDD failures on the same fibre loop, the TSE should arrange for a subsystem dump to be done for that SE99x0 and have it analyzed to best identify the faulty component.
A situation was encountered in Radiance Case #63108314 on an SE9960 where the SIM[s] appeared as such:
CASE_START CASE_SUMMARY: Hi-Track 9900 SIM CASE_DESCRIPTION: <ht_mail_id_2002072909023934.13487> Error information follows: _System Type: 9960 _Site ID: X####57 _System S/N: ###69 _Microcode: DKCMAIN=01-17-94-00/00 The following SIMs have been transferred from this site by Hi-Track: ------- SIM 01 Follows: ---------------------------------------------------- Severity: Moderate SIM Time: Jul 29, 2002 11:52:55 SIM S/N: 64093 SIM Type: DKC Reference Code: BF4A56 Type: ENVIRONMENTAL ERROR Description: HDD MPS0 WARNING Action Codes: Code: 10509600 Location: MPS-L260 Function: Multi PS(HDD) Additional: Multiple PS for HDU-L26 Code: 10519000 Location: AC BOX-L20 Function: AC BOX Additional: AC Box in Fourth Disk Unit(DKU-L2) Code: 10C19000 Location: DKUMN-L2F Function: DKUMN PCB Additional: DKUMN PCB in front of Fourth Disk Unit(DKU-L2) Code: 10D10000 Location: SSVP/HUB Function: SSVP/HUB PCB Additional: SSVP PCB(ASSIST board) SIM Bytes: Byte: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Data: 00 90 10 00 00 00 8F E0 11 40 00 80 F0 56 22 0C Byte: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Data: 22 00 A0 09 00 04 BF 4A 01 10 00 00 F1 00 04 00 ---------------------------------------------------------------------------- ------- SIM 02 Follows: ---------------------------------------------------- Severity: Moderate SIM Time: Jul 29, 2002 11:52:55 SIM S/N: 64095 SIM Type: DKC Reference Code: BF4B56 Type: ENVIRONMENTAL ERROR Description: HDD MPS1 WARNING Action Codes: Code: 10509610 Location: MPS-L261 Function: Multi PS(HDD) Additional: Multiple PS for HDU-L26 (Option) Code: 10519010 Location: AC BOX-L21 Function: AC BOX Additional: AC Box in Fourth Disk Unit(DKU-L2) Code: 10C19010 Location: DKUMN-L2R Function: DKUMN PCB Additional: DKUMN PCB in front of Fourth Disk Unit(DKU-L2) Code: 10D10000 Location: SSVP/HUB Function: SSVP/HUB PCB Additional: SSVP PCB(ASSIST board) SIM Bytes: Byte: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Data: 00 90 10 00 00 00 8F E0 11 40 00 80 F0 56 24 0C Byte: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Data: 22 00 A0 09 00 04 BF 4B 01 10 00 00 F1 00 04 00 ---------------------------------------------------------------------------- CALL_TYPE: Hi-Track Reported SEVERITY: 3-Minor Restriction SITE_ID: X####57 CONTACT_LAST_NAME: hitrack CONTACT_FIRST_NAME: hitrack CONTACT_PHONE: 1-800-348-help CASE_END
SOLUTION SUMMARY:
In this particular situation it seems that there are several failing hardware components. The action codes detail the components in question and, when looked up in the hardware maintenance manual, reference a replacement procedure.
Suggested course of action is to systematically replace the hardware referenced in the SIMs [first to last]. The SIMs may be generated by one of the components listed. In this case we replaced the both Multiple Power Supplies in HDU 6 of the Left 2 DKA (MPS-L260 and MPS-L261). The problem persisted so we replaced AC BOX-L20 and AC BOX-L2. Still, the problem was not remedied. Replacement of the next hardware component, the PCB in DKUMN-L2R and DKUMN-L2F, still did not fix the errors. Replacing the SSVP also did not solve the problem. So one would logically start to look for a hardware component elsewhere, which is not listed specifically in the SIM messages.
If replacing the components listed in the SIM does not fix the problem, then the culprit is something which these components have in common. The rootcause of our ENVIRONMENTAL SIM[s] was a bent pin on the HDU backplane that effected the Multiple Power Supplies which power the HDU. Bent pin[s] as we learned, are picked up by the Subsystem, but ambiguously identified in ENVIRONMENTAL SIM[s]. It may take some time to go through the procedure of a "mock disk replacement" in which case you make the array think you are replacing the drive but in effect you are simply pulling the drive, inspecting the pins on the backplane and re-inserting the same drive. If bent pins are found, the backplane should be replaced.
INTERNAL SUMMARY:
SUBMITTER: Glenn Thoren APPLIES TO: AFO Vertical Team Docs/Storage ATTACHMENTS: