Document fins/I0465-1
FIN #: I0465-1
SYNOPSIS: A5000: Excessive Recoverable Soft errors: (OFFLINEs, CRC Errors, SCSI
parity errors, timeouts) on FC-AL loops
DATE: Dec/23/98
KEYWORDS: A5000: Excessive Recoverable Soft errors: (OFFLINEs, CRC Errors, SCSI
parity errors, timeouts) on FC-AL loops
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: A5000: Excessive Recoverable Soft errors: (OFFLINEs, CRC Errors,
SCSI parity errors, timeouts) on FC-AL loops.
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: A5000 single FC-AL
PRODUCT CATEGORY: Storage / / A5000 / Diagnostics
PRODUCTS AFFECTED:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- A14 - Ultra Enterprise 2 -
- E450 - Ultra Enterprise 450 -
- E3000 - Ultra Enterprise 3000 -
- E3500 - Ultra Enterprise 3500 -
- E4000 - Ultra Enterprise 4000 -
- E4500 - Ultra Enterprise 4500 -
- E5000 - Ultra Enterprise 5000 -
- E5500 - Ultra Enterprise 5500 -
- E6000 - Ultra Enterprise 6000 -
- E6500 - Ultra Enterprise 6500 -
- E10000 - Ultra Enterprise 10000 -
X-Options Affected
------------------
Sun StorEdge A5000 14-Drive FC-AL Arrays
SG-XARY513A-764G A5000 - A5000, 764GB 84 Drive 72" Exp Rack -
SG-XARY012A-509G A5000 - A5000, 509GB 56 Drive 56" Exp Rack -
SG-XARY012A-254G A5000 - A5000, 254GB 28 Drive 56" Exp Rack -
SG-XARY513A-254G A5000 - A5000, 254GB 28 Drive 72" Exp Rack -
SG-XARY010A-45G A5000 - A5000, 45GB 5 Drive TableTop -
SG-XARY510A-45G A5000 - A5000, 45GB 5 Drive TableTop -
SG-XARY010A-127G A5000 - A5000, 127GB 14 Drive TableTop -
SG-XARY510A-127G A5000 - A5000, 127GB 14 Drive TableTop -
SG-XARY011A-127G A5000 - A5000, 127GB 14 Drive Rackmount -
SG-XARY511A-127G A5000 - A5000, 127GB 14 Drive Rackmount -
SG-XARY512A-127G A5000 - A5000, 127GB 14 Drive Rackmount -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
- - -
REFERENCES:
URL: http://storageweb.eng/techmark_site/aces/aces.list.html
URL: http://storageweb.eng/techmark_site/tiger/index.html
PROBLEM DESCRIPTION
As of November 1998, approximately 1600 Terabytes of A5000's have been
shipped worldwide. A small percentage of these units are demonstrating
symptoms consistent with marginal signal quality on the GBaud FC-AL
loops. Marginal signal quality can be caused by variations in component
quality for GBICs, backplanes, interface boards, interconnect assemblies,
drives. Note that the majority of Sun A5000 customers are not experienc-
ing these problems.
Symptoms of this problem include the following:
o Excessive "CRC Error" or "SCSI parity error" messages
o Excessive "OFFLINE" messages
o Excessive "timeout" messages
o "Offline Timeout" messages
The A5000 manufacturability and field reliability has suffered from
several component quality and signal margin which have plagued
all first generation FC-AL systems in the industry.
Soon after FCS, A5000 test processes were enhanced to reduce the chances
that defective material would interfere with customer applications.
IMPLEMENTATION:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following actions are recommended to examine the the level
of warnings/errors on A5000 subsystems and provide clear guidelines
for applying corrective action. These mainly are with respect to
SOFT ERRORS. In case of fatal error messages and system failures,
standard troubleshooting methods and processes should be applied.
o Proactively examine error logs for symptoms of the problem.
o If symptoms exist, follow the guidelines for further action
as specified in the table below. The actions may be
to simply monitor for changes in error level but may
also call for running loop integrity diagnostics that
may lead to a requirement to perform FRU isolation and
replacement.
If no symptoms exist, no action is required.
o For any and all reconfigurations, STORtools Loop Integrity
Test should be executed for those affected loops. An
affected loop is defined to be one which has had any
components moved/replaced/added (i.e. replacement of
GBIC, loop connected to different host adapter port,
new A5000 connected to HUB, etc.) If the Loop Integrity
Test fails, subsequent FRU isolation/replacement is
required.
o For *ALL* new installations/equipment add ons, STORtools
Loop Integrity Test should be run on all new/modified
loops. If the Loop Integrity Test fails, subsequent
FRU isolation/replacement is required.
Frequency
Error/symptom (per loop) Action
=============== =================== ==============================
CRC/scsi parity 1-10 per day. Monitor only (for change in
Errors, OFFLINE frequency).
timeouts. ------------------- ------------------------------
> 10 per day OR > 5 run STORtools Loop Integrity
per day for 3 con- Test (LIT). If LIT fails,
secutive days. perform FRU isolation and
replacement. While waiting
for scheduled LIT execution,
Loop may be disconnected to
prevent performance affects
of excessive retries.
=============== =================== ==============================
Offline Timeout > 0 run STORtools Loop Integrity
OR vxvm path Test (LIT). If LIT fails,
failure FRU isolation and
replacement. While waiting
for scheduled LIT execution,
Loop may be disconnected to
prevent performance affects
of excessive retries.
=============== =================== ==============================
NOTE: that the frequency indicators do allow for some level of
soft errors. These can be normal and should not affect
the applications.
Stortools Beta 2.4 is available through your local storage ace. A
general availability version is targeted for release in January.
Manual and video for the beta 2.4 release is available via URL:
http://storageweb.eng/techmark_site/tiger/index.html
To find your local storage ace reference URL:
http://storageweb.eng/techmark_site/aces/aces.list.html
COMMENTS:
------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission crit-
ical support teams will recommend implementation of the FIN (to
their respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN
as the need arises.
------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
______________
* Access the top level URL of http://cte.corp/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
____________________
Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "SunService Documentation"
and
click on "FIN & FCO attachments", then choose the appropriate
folder,
FIN or FCO. This will display supporting directories/files for FINs or
FCOs.
Internet Access:
_______________
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
________
Send questions or comments to finfco-manager@cte.Corp
---------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.