Document fins/I0876-1
FIN #: I0876-1
SYNOPSIS: Sun StorEdge T3+ firmware included with patch 112276 and 69x0
firmware with patch 113247 or higher provides improved disk error
handling
DATE: Sept/16/02
KEYWORDS: Sun StorEdge T3+ firmware included with patch 112276 and 69x0
firmware with patch 113247 or higher provides improved disk error
handling
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
SYNOPSIS: Sun StorEdge T3+ firmware included with patch 112276
and 69x0 firmware with patch 113247 or higher provides
improved disk error handling.
SunAlert: No
TOP FIN/FCO REPORT: Yes
PRODUCT_REFERENCE: Sun StorEdge T3+/6910/6960
PRODUCT CATEGORY: StorEdge / SW Admin
PRODUCTS AFFECTED:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- ANYSYS - System Platform Independent -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- T3+ ALL Sun StorEdge T3+ -
- 6910 ALL Sun StorEdge 6910 Array -
- 6960 ALL Sun StorEdge 6960 Array -
PART NUMBERS AFFECTED:
Part Number Description Model
----------- ----------- -----
- - -
REFERENCES:
BugId: 4697868 - disk in raid 5 on T3+ failed and Oracle database
crashed in clustered config.
PatchId: 112276: T3+ 2.01.01: System Firmware Update.
113247:
PROBLEM DESCRIPTION:
Sun StorEdge T3+ arrays with firmware version 2.1 or lower may be
susceptible to loss of data access. This can occur when the disk
error handling routines included with this firmware repeatedly retry
read/write operations on disk errors. This can lead to delays and
may cause the host to unmount the volume or cause an application to
timeout.
Affected systems include any StorEdge T3+ array without patch 112276.
T3+ arrays experiencing this issue will show multiple errors of these
types in their syslog file:
More than one "Sense Key = 0x4" error on one specific drive. An
example
follows:
Jun 05 06:16:14 ISR1[2]: W: u2d5 SCSI Disk Error Occurred (path = 0x0)
Jun 05 06:16:14 ISR1[2]: W: Sense Key = 0x4, Asc = 0x15, Ascq = 0x1
OR
More than one "Sense Key = 0x3" error, EXCEPT for 03/11, on one
specific
drive. An example follows:
Feb 07 10:19:52 ISR1[2]: W: u2d5 SCSI Disk Error Occurred (path = 0x0)
Feb 07 10:19:52 ISR1[2]: W: Sense Key = 0x3, Asc = 0x16, Ascq = 0x0
Feb 07 10:19:52 ISR1[2]: W: Sense Data Description = Data Synchronization
Mark error
Feb 07 10:19:52 ISR1[2]: W: Valid Information = 0xafe6c4
OR
A single "Sense Key = 0x1, Asc = 0x5d" error on one specific drive. An
example follows:
Jul 31 16:19:22 ISR1[1]: N: u1d3 SCSI Disk Error Occurred (path = 0x1)
Jul 31 16:19:22 ISR1[1]: N: Sense Key = 0x1, Asc = 0x5d, Ascq = 0x0
Jul 31 16:19:22 ISR1[1]: N: Sense Data Description = Failure Prediction
Threshold Exceeded
This issue is caused by the disk error handling routines found in the current
T3+ firmware levels (2.1 and below). If the drive sees one of the following
errors,
"Sense Key = 0x4",
"Sense Key = 0x01, Asc = 0x5d"
the T3 will continue to retry read/write operations. This may appear
to the host as the T3 not responding. Depending upon the
configuration, applications being used, and type of volume manager, the
host may unmount the volume, simply wait to retry, or the application
may timeout.
This issue has been addressed with patch 112276. There are several
precautions which must be taken before installing this patch:
WARNING:
1. Do not install patch 112276 on arrays with Seagate ST336752FC (36GB)
disk drives. To determine if the ST336752FC drive is installed in an
array, do the following.
Telnet to the T3+ and run the 'fru list' command.
This drive is sensitive to rough handling, and this may cause an
occasional need for logical block address (LBA) reassignment (See
FIN I0836-1 for details). In RAID-based systems, this function
is handled automatically without operator intervention. Patch
112276 will disable the drive based on errors reported from
the ST336752FC drive. With this drive's sensitivity to handling,
it is preferable that the system perform a reassignment.
Therefore, DO NOT install the patch for these disk drives.
NOTE: Sun Engineering is addressing this and will issue a single
patch for all drives with the next roll of the T3+ firmware.
2. Installing this patch without first reviewing drive error logs in the
T3 syslog file, and taking pro-active action**, may result in disabled
drives, leading to loss of volume access.
** Pro-active means replacing these drives prior to installing the patch.
Installation of this patch may result in drives with specific error
conditions being disabled. The drives which exhibit these errors prior
to installing this patch will be disabled when these errors are first
seen following the patch installation.
With patch 112276, these errors are properly handled and the
appropriate action of disabling the drive is taken. If it happens
that more than one drive is reporting these errors in a given RAID
volume, then all of the drives reporting errors will be disabled,
resulting in unmounting the volume and loss of access to data.
Please follow the special installation procedures for patch 112276
provided below. Not following the patch pre-install procedure may result
in your customer seeing drives being disabled.
IMPLEMENTATION:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.
1. For most T3+ arrays, install patch 112276.
2. For T3+ arrays within 69x0 systems, install patch 113247, which
is included with the "2.1.1 to 2.1.2 Storage Services Processor Image
Upgrade 1.0" which is found at http://edist.central. See FIN I0870-1
for more details.
For either patch installation, please observe the following:
I. PATCH PRE-INSTALL INSTRUCTIONS:
----------------------------------
1. Read the syslog file of the T3+ system on which patch 112276 or
113247 will be installed.
2. Ensure you have identified any/all drives for the identified errors.
Scanning the T3 syslog will show multiple error messages as follows,
More than one "Sense Key = 0x4" error on one specific drive.
An example follows:
Jun 05 06:16:14 ISR1[2]: W: u2d5 SCSI Disk Error Occurred (path = 0x0)
Jun 05 06:16:14 ISR1[2]: W: Sense Key = 0x4, Asc = 0x15, Ascq = 0x1
OR
More than one "Sense Key = 0x3", EXCEPT for 03/11, error on one
specific
drive.
An example follows:
Feb 07 10:19:52 ISR1[2]: W: u2d5 SCSI Disk Error Occurred (path = 0x0)
Feb 07 10:19:52 ISR1[2]: W: Sense Key = 0x3, Asc = 0x16, Ascq = 0x0
Feb 07 10:19:52 ISR1[2]: W: Sense Data Description = Data
Synchronization Mark error
Feb 07 10:19:52 ISR1[2]: W: Valid Information = 0xafe6c4
OR
A single "Sense Key = 0x1, Asc = 0x5d" error on one specific
drive.
An example follows:
Jul 31 16:19:22 ISR1[1]: N: u1d3 SCSI Disk Error Occurred (path = 0x1)
Jul 31 16:19:22 ISR1[1]: N: Sense Key = 0x1, Asc = 0x5d, Ascq = 0x0
Jul 31 16:19:22 ISR1[1]: N: Sense Data Description = Failure Prediction
Threshold Exceeded
3. If there are any reported errors:
A. Backup the volume and
B. Replace the drives reporting these errors
4. Ensure the volume is in optimal working state:
ie, no FRUs are disabled (run frustat)
II. PATCH INSTALL:
------------------
Download and install T3+ patch 112276 or patch 113247 per normal
patch procedures.
COMMENTS:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------
Copyright (c) 1997-2003 Sun Microsystems, Inc.