Document fins/I0547-1


FIN #: I0547-1

SYNOPSIS: Intermittently, SCSI devices connected to a UDWIS card may

DATE: Jan/21/00

KEYWORDS: Intermittently, SCSI devices connected to a UDWIS card may


---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------  
                            FIELD INFORMATION NOTICE
                  (For Authorized Distribution by SunService)



SYNOPSIS: Intermittently, SCSI devices connected to a UDWIS card may
          not be usable after a reboot.  This can prevent a system
          from booting if the UDWIS card connected to the boot device
          is affected.

	               
TOP FIN/FCO REPORT: Yes 
 
PRODUCT_REFERENCE:  UDWIS fcode
 
PRODUCT CATEGORY:   Server / System Board;

PRODUCTS AFFECTED:  
  
Mkt_ID   Platform   Model   Description   Serial Number
------   --------   -----   -----------   -------------
Systems Affected
----------------

  -       A11        ALL    Ultra Enterprise 1              -
  -       A12        ALL    Ultra Enterprise 1E             -
  -       A14        ALL    Ultra Enterprise 2              -
  -       E3000      ALL    Ultra Enterprise 3000           -
  -       E3500      ALL    Ultra Enterprise 3500           -
  -       E4000      ALL    Ultra Enterprise 4000           -
  -       E4500      ALL    Ultra Enterprise 4500           -
  -       E5000      ALL    Ultra Enterprise 5000           -
  -       E5500      ALL    Ultra Enterprise 5500           -
  -       E6000      ALL    Ultra Enterprise 6000           -
  -       E6500      ALL    Ultra Enterprise 6500           -
  -       E10000     ALL    Ultra Enterprise 10000          -

(See Corrective Action)

X-Options Affected
------------------

  -       -          ALL    StorEdge A1000                  -
  -       -          ALL    Netra st A1000                  -
  -       -          ALL    StorEdge D1000                  -
  -       -          ALL    Netra st D1000                  -
  -       -          ALL    StorEdge A3500                  -
  -       -          ALL    StorEdge L280 tape library      -
  -       -          ALL    StorEdge L700 tape library      -
  -       -          ALL    StorEdge L1000 tape library     -
  -       -          ALL    StorEdge L1800 tape library     -
  -       -          ALL    StorEdge L3500 tape library     -
  -       -          ALL    StorEdge L11000 tape library    -


PART NUMBERS AFFECTED:

Part Number   Description                              Model
-----------   -----------                              -----
370-2443-01   Differential Ultra/Wide SCSI (UDWIS/S)     -

REFERENCES:

BugId: 4272400 4230719
Esc:   523110 522070 521024 522036 522925 523016 523175
FIN:   I0552-1


PROBLEM DESCRIPTION: 

Intermittently, SCSI devices connected to a UDWIS card with FCode earlier
than 1.28 may not be usable after a system boot or reboot.  This can
prevent a system from booting if the boot device is connected to an
affected UDWIS card.

Alternatively, if a non-boot UDWIS card is affected, the system may boot
but will not have have access to any SCSI devices which are connected to
that affected UDWIS card.

These issues can occur irrespective of the type of SCSI device connected
to the UDWIS card and hence can affect Sun and third-party SCSI devices.

This problem is seen on both standalone and clustered systems.  Faster
CPUs (like the E10k) are more likely to be affected, so a previously
working system may start to exhibit UDWIS problems after upgrading the
CPUs to those with a higher clock speed.  Also systems with a large number
of UDWIS cards are more likely to see these problems since any one of the
installed cards could be affected.

This problem is due to some issues in FCode versions less than 1.28 in
the UDWIS card.  When it is booting, the card fails to send a SCSI bus
reset, or the SCSI bus reset is not held long enough to meet the SCSI
specification, or the card fails to initialize correctly.  This leads to
the attached SCSI devices failing to negotiate correctly with the host.


Example 1
---------

In some cases, if the affected UDWIS card does not control the boot device
so that the system does boot, then the 'sd' driver will record a corrupted
SCSI inquiry string in the messages file during the boot for devices attached
to the affected UDWIS card.  This example was from an A3x00:

	unix: sd2044 at QLGC,isp17: target 4 lun 0
	unix: sd2044 is /sbus@5d,0/QLGC,isp@1,10000/sd@4,0
	unix:      Vendor 'SoEG', product '00****7*********', 
	(unknown capacity)
	
You would also see corrupted inquiry strings using format -> inquiry.
For example, with a Seagate ST39103LC 9GB drive, you would see:

	Vendor  S313CU90

This is every _other_ expected character S(T)3(9)1(0)3(L)C(S)U(N)9(.)0....


Example 2
---------

As an example of the possible effect of this problem, with an A3x00
connected to an affected UDWIS card, a controller path will be offline.
In a cluster, this affects the running node due to I/O loads migrating to
one controller.


Example 3
---------

Another example of the 'missing SCSI bus reset' problem in a cluster
environment, can be recognized as follows:

In a configuration with an A3x00 connected in a cluster, then when one
node reboots, the other (running) node should record the same number of
SCSI bus resets as the number of shared SCSI busses.

For example, if 8 A3x00s are dual-hosted, then the running node should
receive 16 resets like the following, when the other node is booting:

	unix: WARNING: /sbus@49,0/QLGC,isp@1,10000 (isp5):
	unix:      Received unexpected SCSI Reset

If fewer than 16 are received, then expect some A3x00 controllers to
be offline.


IMPLEMENTATION: 
 
         ---
        |   |   MANDATORY (Fully Pro-Active)
         ---    
         
  
         ---
        | X |   CONTROLLED PRO-ACTIVE (per Sun Geo Plan) 
         --- 
         
                                
         ---
        |   |   REACTIVE (As Required)
         ---



CORRECTIVE ACTION: 

Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problems by following the recommendations
as shown below:

If the affected UDWIS card is connected to an A3x00 which is not the
boot device, then a simple workaround is to online the failed controller
through RM6.

- In the recovery guru -> options -> manual recovery -> controller
pairs.

With other SCSI devices, the only workaround is to perform a shutdown and
bringup on the affected E10k domain (a reboot will not correct the problem
in all cases), or a shutdown and power cycle on other systems (since,
again, a reboot will not correct the problem in all cases).

**********
This issue is currently being addressed in forthcoming FCO A0163-1 that
will update the UDWIS card to FCode version 1.28.
**********

The recommendation is to evaluate customer configurations to determine if
the forthcoming FCO A0163-1 change will apply.  This problem can affect
any devices connected to UDWIS cards including D1000, A1000, A3x00, or
A7000 Sun Storage products, as well as SCSI-attached OEM storage,
especially (but not only) when connected to systems with fast CPUs, like
E10000s.

Also strongly recommend that any mission-critical sites implement
FCO A0163-1 once it is released.  Currently the FCO A163-1 is not released
and is in pending state.


COMMENTS: 
  
There are three ways to determine if a particular UDWIS card has the
affected (versions earlier than 1.28) FCode:

a) Physically inspect the card.

The part number sticker on the SBus connector will show '370-2443-01'
for FCode versions prior to 1.28.
   
   
b) Use the OBP '.properties' command.

Use the following sequence of commands at the 'ok' prompt:

	dev <device path to UDWIS card>
	.properties
	device-end
	
First, find the correct path to the UDWIS card you wish to check and use
that for the 'dev' command (see example below).

When '.properties' executes, examine the value of the property
"isp-fcode".
If it shows "1.28 99/11/08" then this is the 1.28 FCode which contains
the
fixes for the problems described in this FIN.  If it shows any earlier
version number, then the problems described in this FIN may be experienced.

Here is an example from Ultra-2 with a UDWIS card with the affected
(pre-1.28) FCode in SBus slot 1:

        ok> reset-all
        ok> dev /sbus@1f,0/QLGC,isp@1,10000
        ok> .properties
        scsi-initiator-id       00000007
        clock-frequency         03938700
        differential    
        isp-fcode               1.25 96/10/15 <-- FCode earlier than 1.28
        device_type             scsi 
        intr                    00000003 00000000
        interrupts              00000003
        wide            
        fast-20 
        reg                     00000001 00010000 00000450
        64-bit-clean 
        model                   QLGC,ISP1000U 
        name                    QLGC,isp
        ok> device-end
        ok>

This example shows a UDWIS card with FCode version 1.25.  Since this is
earlier than version 1.28, it could experience the problems described in
this FIN.


c) Use the Solaris 'prtconf -vp' command.

In the output from the 'prtconf -vp' command, examine the 'isp-fcode' value
as for the OBP example above.

Example from a system with a UDWIS card with the affected (pre-1.28) FCode:

[lots of other output ...]

  Node 0xf007aa94
            scsi-initiator-id:  00000007
            clock-frequency:  03938700
            differential:  00
            isp-fcode:  '1.25 96/10/15' <-- FCode earlier than 1.28
            device_type:  'scsi'
            intr:  00000003.00000000
            interrupts:  00000003
            wide:  00
            fast-20:  00
            reg:  00000001.00010000.00000450
            64-bit-clean:  00
            model:  'QLGC,ISP1000U'
            name:  'QLGC,isp'

[lots of other output ...]


--------------------------------------------------------------------------
Implementation Footnote:

i)   In case of MANDATORY FINs, Enterprise Services will attempt to    
     contact all affected customers to recommend implementation of 
     the FIN. 
   
ii)  For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical    
     support teams will recommend implementation of the FIN  (to their  
     respective accounts), at the convenience of the customer. 

iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the   
     need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network 
browser as follows:
 
SunWeb Access:
-------------- 
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/

* From there, select the appropriate link to query or browse the FIN and
  FCO Homepage collections.
 
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/

* From there, select the appropriate link to browse the FIN or FCO index.

Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist.  Edist can be 
  accessed internally at the following URL: http://edist.corp/.
  
* From there, follow the hyperlink path of "Enterprise Services Documenta- 
  tion" and click on "FIN & FCO attachments", then choose the
appropriate   
  folder, FIN or FCO.  This will display supporting directories/files for 
  FINs or FCOs.
   
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------
                                                        



Copyright (c) 1997-2003 Sun Microsystems, Inc.