Document fcos/A0182-1
FCO #: A0182-1
SYNOPSIS: 18GB and 36GB IBM disk drives experiencing high failure rate in high
humidity and high temperature environments
DATE: Nov/13/2001
KEYWORDS: 18GB and 36GB IBM disk drives experiencing high failure rate in high
humidity and high temperature environments
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD CHANGE ORDER
(For Authorized Distribution by SunService)
SYNOPSIS: 18GB and 36GB IBM disk drives experiencing high
failure rate in high humidity and high temperature
environments.
Sun Alert: Y
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: IBM Disk Drives 18GB & 36GB
PRODUCT CATEGORY: Storage / Disk
PRODUCT AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected:
------- --------
- A14 All Ultra 2 -
- A20 All Ultra 450 -
- A23 All Ultra 60 -
- A27 All Ultra 80 -
- N04 All Netra T1120 -
- N03 All Netra T1125 -
- N21 All Netra T1 DC200 -
- N14 All Netra t1405 -
- N15 All Netra t1400 -
- E250 All Ultra Enterprise 250 -
- E450 All Enterprise 450 -
- E3000 All Ultra Enterprise 3000 -
- E3500 All Ultra Enterprise 3500 -
- E4000 All Ultra Enterprise 4000 -
- E4500 All Ultra Enterprise 4500 -
- E5000 All Ultra Enterprise 5000 -
- E5500 All Ultra Enterprise 5500 -
- E6000 All Ultra Enterprise 6000 -
- E6500 All Ultra Enterprise 6500 -
- E10000 All Ultra Enterprise 10000 -
- S8 All Sun Fire 3800 -
- S12 All Sun Fire 4800 -
- S12i All Sun Fire 4810 -
- S24 All Sun Fire 6800 -
X-Options Affected
--------- -------
- st D130 All Netra st D130 -
- A1000 All StorEdge A1000 -
- A13500 All StorEdge A3500 -
- A13500FC All StorEdge A3500FC -
- D1000 All StorEdge D1000 -
- T3 All StorEdge T3 -
- D240 All StorEdge D240 -
- MultiPack All StorEdge MultiPack -
- st A1000/D1000 All Netra st A1000/D1000 -
- ct 400/800 All Netra ct 400/800 -
AFFECTED PARTS:
Part Number Description Model
----------- ----------- -----
540-4401-01 DRV NEBS 18GB 10K 1 SCSI W/S&P -
540-4921-01 18GB SCSI 10K 1 NEBS SD DRIVE -
540-4520-01 DRV ASSY 36GB 1 SCSI SPUD&PLAT -
540-4689-01 DRV NEBS 36GB 10K 1 SCSI W/S&P -
540-4440-01 ASSY 18GB 10K 1 FC LP W/SLED -
540-4367-01 ASSY 36GB 10K 1 FC LP W/SLED -
540-4178-01 DRV 18GB 10K 1 SCSI W/SPUD&PLT -
540-4177-01 DRV assy 18GB10K 1 SCSI W/SPUD -
595-5471-01 FRU MEDIA TRAY18.2GB HDD _
(SCSI Devices)
Type Vendor Model SerialNumber(Min) SerialNumber(Max) Firmware
---- ------ ------- ------------------ ------------------ --------
Disk IBM DDYS-T1835 - - -
Disk IBM DDYS-T3695 - - -
Disk IBM DDYF-T1835 - - -
Disk IBM DDYF-T3695 - - -
REFERENCES :
BugID: 4490041
ESC: 531685
SunAlert: SA-40130
WWStopShip: P200-20006
FIN: I0724-2
DPCO: 278.A
PROBLEM DESCRIPTION :
Any system with 18.2GB and 36GB IBM disk drives may be susceptible
to early life failures.
Failure analysis results have highlighted a significant failure rate
for Drive Not Ready (DNR) on returned IBM 18GB and 36GB disk drives.
These failures have been observed to occur as a result of the disk
drives either being stored or operated in extremely hot and humid
environments for an extended period of time.
Root Cause Analysis has identified several contributing factors leading
to drive failures. None of the factors stand alone, and all the factors
must occur or be present for the identified DNR failure mode. The
various factors are:
. Microscopic talcum residue,
. Disks packaged in systems in drive trays,
. Exposure to high temperature (30degC or above),
. High humidity (90% or above) for a period greater than 20 days.
Sample error messages:
/sbus@7,0/QLGC,isp@0,10000/sd@1,0 (sd46):
Error for Command: write Error Level: Fatal
Sense Key: Hardware Error
ASC: 0x2 (no seek complete), ASCQ: 0x0, FRU: 0x0
10098107 c1t8d0 540-4178-01 DDYS-T18350 01061XE682
/sbus@7,0/QLGC,isp@0,10000/sd@1,0 (sd46):
Error for Command: read Error Level: Fatal
Sense Key: Vendor Unique
ASC: 0x80 (), ASCQ: 0x0, FRU: 0xa
10102117 c2t10d0 540-4178-01 DDYS-T18350 01061XE522 108305
/sbus@7,0/QLGC,isp@0,10000/sd@a,0 (sd54):
Error for Command: read Error Level: Fatal
Sense Key: Media Error
ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0
10102117 c1t5d0 540-4178-01 DDYS-T18350 01061XE630 108305
/sbus@3,0/QLGC,isp@0,10000/sd@0,0 (sd15):
Error for Command: write Error Level: Fatal
Sense Key: Media Error
ASC: 0x3 (peripheral device write fault), ASCQ: 0x0, FRU: 0x0
10088635 c2t13d0 540-4178-01 DDYS-T18350 01061XE750
/sbus@6,0/QLGC,isp@1,10000/sd@d,0 (sd42):
Error for Command: load/start/stop Error Level: Retryable
Sense Key: Not Ready
ASC: 0x4 (LUN not ready), ASCQ: 0x0, FRU: 0x0
10096449 c2t9d0 540-4178-01 DDYS-T18350 01061XE634
The most frequent failures seen are, "Drive not ready" or the drive
may produce excessive read, write or media errors.
- IBM builds drives and ships almost the same day, in packaging with
desiccant packs.
- Only after the drives are assembled in Sun enclosures are they
susceptible to this problem.
- Drives in enclosures would have to sit in a high temperature, high humidity
environment for more than 20 days before condensation could become an issue.
- Drive that are running in a system for more than 90 days should not
experience this problem.
- If after 90 days the drive is stopped for any period of time and
NOT exposed to high temperature, high humidity, it will not experience
this problem.
- Drives in arrays, powered up but not configured may go into a sleep mode.
If these drives were previously exposed to high temperature, high
humidity, and were "sleeping" for a period of time, the problem could
surface when the drives are accessed.
Corrective action was implemented in Manufacturing by purging all suspect
IBM Drives via Worldwide Purge P200-20006 issued on August 25, 2001.
Corrective Action was put in place in Enterprise Services via DPCO# 278
on Janurary 24, 2002.
A copy of either the Sun Legal approved Customer Letter or the Frequently
Asked Questions document can be accessed via the following URLs;
CUSTOMER LETTER;
http://sdpsweb.EBay/FIN_FCO/FCO/FCO_A0182-1_Dir/IBM_CUST_Letter31OCT.sdw
Note: To view document click on the above URL, then save to your local
disk using your Netscape 'file' button and select 'save as', then
open file locally using StarOffice.
FREQUENTLY ASKED QUESTIONS;
http://sdpsweb.EBay/FIN_FCO/FCO/FCO_A0182-1_Dir/Q&A
PLANNED IMPLEMENTION COMPLETION DATE: June 30, 2002
IMPLEMENTATION :
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | UPON FAILURE
---
REPLACEMENT TIME ESTIMATE : 0.25 hours
SPECIAL CONSIDERATION :
Before proactively replacing any drives the Sun Authorized Field
Representative should complete a Disk Drive Reliability Check as
outlined below.
*** Disk Drive Reliability Check ***
Answer the following questions to help determine if FCO AO182-1 should
be applied.
------------------------------------------------------------------------
------------------------------------------------------------------------
#1. Define the customer's install base by part number. Use explorer to
identify the part numbers.
Product Part Number Model Quantity
Servers 540-4177-01 (18.2GB) ___________
Dilbert A/D1000 540-4178-01 (18.2GB) ___________
Avalanche 540-4401-01 (18.2GB) ___________
NEBS 540-4921-01 (18.2GB) ___________
T3 540-4440-01 (18.2GB) ___________
T3 540-4367-01 (36.4GB) ___________
Dilbert A/D1000 540-4520-01 (36.4GB) ___________
Avalanche 540-4689-01 (36.4GB) ___________
#2. List the number of failures by part number and manufacturer:
Part Number Model Quantity Manufacturer Failure Mode
540-4177-01 (18.2GB)___________ ___________ ___________
540-4178-01 (18.2GB)___________ ___________ ___________
540-4401-01 (18.2GB)___________ ___________ ___________
540-4921-01 (18.2GB)___________ ___________ ___________
540-4440-01 (18.2GB)___________ ___________ ___________
540-4367-01 (36.4GB)___________ ___________ ___________
540-4520-01 (36.4GB)___________ ___________ ___________
540-4689-01 (36.4GB)___________ ___________ ___________
#3. Are the majority of the drives listed in #2 IBM drives?
If no, reevaluate the situation, send drives in for failure analysis
via CPAS, this FCO does not apply to this case.
#4. Does the number of drive failures exceed the expected failure
rate? (See chart below)
Note: an example follows.
|------------------------------------------------------------
| # of drives | 3 months | 6 months| 9 months| 12 months |
| on site | | | | |
|===========================================================|
| 300 | 1-2 | 3-4 | 4-5 | 6 |
|----------------|----------|---------+---------+-----------|
| 400 | 2 | 4 | 6 | 8 |
|----------------|----------|---------|---------|-----------|
| 500 | 2-3 | 5-6 | 7-8 | 10 |
|----------------|----------|---------|---------|-----------|
| 1,000 | 5 | 10 | 15 | 20 |
|----------------|----------|---------|---------|-----------|
| 1,500 | 8-9 | 14-15 | 24 | 30 |
|----------------|----------|---------|---------|-----------|
|total failures | | | | |
|-----------------------------------------------------------|
Example:
In this example customer X has the following IBM
drives in the data center:
Quantity drive type install # of
time failures
300 540-4178-01 (18.2GB) approx 9 months 9
500 540-4520-01 (36.4GB) approx 9 months 14
400 540-4520-01 (36.4GB) approx 12 months 11
That's 300 18GB drives installed for nine months, and also
500 36GB drives installed for nine months. The customer also
has 400 36GB drives that have been installed for about twelve
months.
Example: Fill in a chart defining your customer's information:
-------------------------------------------
| List # of failures during install period |
|------------------------------------------------------------
| # of drives | 3 months | 6 months| 9 months| 12 months |
| on site | | | | |
|===========================================================|
| 300 | | | 9 | |
|----------------|----------|---------|---------|-----------|
| 400 | | | | 11 |
|----------------|----------|---------|---------|-----------|
| 500 | | | 14 | |
|----------------|----------|---------|---------|-----------|
| 1,000 | | | | |
|----------------|----------|---------|---------|-----------|
| 1,500 | | | | |
|----------------|----------|---------|---------|-----------|
|total failures | | | 23 | 11 |
|-----------------------------------------------------------|
Comparing the customer drive reliability profile with the
expected chart below.
--------------------------------------------------------------
| Drives/install time | Expected failures | Actual failures |
|====================================================+=======|
| 300/9 months | 4-5 | 9 |
|----------------------|-------------------|-----------------|
| 500/9 months | 7-8 | 14 |
|----------------------|-------------------|-----------------|
| 400/12 months | 8 | 11 |
|------------------------------------------------------------|
Looking at the numbers we can now answer the question, "Does the number
of drive failures exceed the expected failure rate?"
In this example the answer is YES, the customer's failure rate exceeds
the expected.
We would expect no more than 4-5 18GB drives to fail in 9 months,
the customer had 9.
We would expect no more than 7-8 36GB drives to fail in 9 months,
the customer had 14.
We would expect no more than 8 36GBB drives to fail in 12 months,
the customer had 11.
#5. Is there a possibility the systems were stored in a high heat (30degreesC
or above) and high humidity (90% or above), for 20 days or more?
Examples include:
At customs, a reseller, or a non air conditioned data or storage area.
If yes, proceed to number 6.
If no, reevaluate the situation, send drives in for failure analysis, this
FCO may not apply.
#6. Have the majority of the drives failed due to DNR errors? Y or N
If yes to #'s 4, 5 and 6, it is recommended the IBM drives should be
replaced per this FCO.
If no, reevaluate the situation, send drives in for failure analysis, this
FCO may not apply.
NOTE: You must run explorer script to identify all IBM drives.
CORRECTIVE ACTION :
IMPORTANT! Please follow the Disk Drive Reliability Check listed under
the Special Consideration section of this FCO prior to implementing
any proactive swap activity.
Upon failure or upon customer need replace as follows;
replace 540-4177-01 (IBM Only) with 540-4177-01 (Non IBM)
replace 540-4178-01 (IBM Only) with 540-4178-01 (Non IBM)
replace 540-4921-01 (IBM Only) with 540-4921-01 (Non IBM)
replace 540-4440-01 (IBM Only) with 540-4440-01 (Non IBM)
replace 540-4367-01 (IBM Only) with 540-4367-01 (Non IBM)
replace 540-4520-01 (IBM Only) with 540-4520-01 (Non IBM)
NEBS Compliance:
If maintaining NEBS3 Compliance is essential to your customer it
is recommended that proactive swaps NOT be implemented unless
absolutely necessary as there is limited NEBS3 materials available.
For NEBS3 Compliance replace either the 18GB or 36GB drive with
the new NEBS3 Compliant 36GB drive as follows;
replace 540-4401-01 with 540-5160-01
replace 540-4689-01 with 540-5160-01
If maintaining NEBS3 Compliance is NOT essential to your customer
replace as follows:
replace 540-4401-01 (IBM Only) with 540-4401-01 (Non IBM)
replace 540-4689-01 (IBM Only) with 540-4689-01 (Non IBM)
For proactive replacement of non-failed drives mark the Defective Material
Tag (DMT) with the letters, "FCO" in bold letters. For failed drives
mark
the DMT as usual with the failure information.
COMMENTS :
IMPORTANT! SECURE SITE ACTIVITY
Below are the instructions for implementing this FCO at Secure Sites
where no drive will be returned. IBM will require documentation that
meets the following requirements:
1) Documentation on Customer or Government letterhead. If the customer does
not wish to use their letterhead, or Sun does not wish to disclose the who
a customer is, this note can be on Sun letterhead.
2) Documentation should be addressed to:
Daria Casey
IBM
5600 Cottle Road
Dept LJK, Building 010
San Jose, CA 95193
3) Documentation should also be faxed to Daria at (408) 979-1344 prior to
mailing the original.
4) The documentation should state that the parts contain classified
information and can not be returned and are being scrapped. The note
should then list the part number and serial number of all drives being
scrapped.
BILLING TYPE:
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on how the
system was initially installed.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
--------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of Mandatory FCOs, Enterprise Services will attempt to contact
all known customers to recommend the part upgrade.
ii) For controlled proactive swap FCOs, Enterprise Services mission critical
support teams will initiate proactive swap efforts for their respective
accounts, as required.
iii) For Replace upon Failure FCOs, Enterprise Services partners will
implement the necessary corrective actions as and when they are required.
--------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
______________
* Access the top level URL of http://sdpsweb.EBay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
_______________
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
________
Send questions or comments to finfco-manager@sdpsweb.EBay
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.