InfoDoc ID   Synopsis   Date
130   Scsi Transport Errors and how to identify them   8 Nov 2002

Status Issued

Description

SCSI Transport Errors and how to identify them:

Note: This is just a quick reference on SCSI errors. SCSI errors can and will be more indepth than what we can go into here.

There are 4 tools that can help identify SCSI transport errors on a sun4u machine.

  1. Find out what kind of system it is and what OS they are running
  2. Latest /var/adm/messages
  3. Copy of /usr/platform/sun4u/sbin/prtdiag -v
  4. A copy of showrev -p

The /var/adm/messages will give you hints to what is going on.

If the SCSI errors look like this:

Aug 3 14:02:57 asta unix: WARNING: /pci@1f,4000/scsi@3 (glm0):
Aug 3 14:02:57 asta unix: WARNING: /pci@1f,4000/scsi@3 (glm0):
Aug 3 14:02:57 asta unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Aug 3 14:02:57 asta unix: SCSI transport failed: reason 'reset': retrying command                                    

Then check the level of the glm patch they have. If it is much lower then the latest rev have them update the patch level.

If the messages look like this:

Mar 24 06:58:10 aurora unix: warning: /pci@6,4000/scsi@4 (glm4):
Mar 24 06:58:10 aurora unix: SCSI bus DATA IN phase parity error
Mar 24 06:58:10 aurora unix: warning: ID[SUNWpd.glm.parity_check.6010]
Mar 24 06:58:10 aurora unix: warning: /pci@6,4000/scsi@4 (glm4):
Mar 24 06:58:10 aurora unix: Target 0 reducing sync. transfer rate
Mar 24 06:58:10 aurora unix: warning: ID[SUNWpd.glm.sync_wide_backoff.6014]
Mar 24 06:58:10 aurora unix: warning: /pci@6,4000/scsi@4/sd@0,0 (sd60):
Mar 24 06:58:10 aurora unix: SCSI transport failed: reason 'tran_err': retrying command                                    

Check the termination and cables for bent pins. Usually with SCSI bus DATA phase parity errors it is the cable and/ or termination. Also check for patches for glm, and disk firmware.

If the errors looks like this:

Jun 6 19:16:34 oerpsv01 unix: ID[SUNWssa.socal.link.5010] socal1: port 0:Fibre Channel is OFFLINE
Jun 6 19:16:34 oerpsv01 unix: ID[SUNWssa.socal.link.6010] socal1: port 0:Fibre Channel Loop is ONLINE
Jun 6 19:17:49 oerpsv01 unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370ed7ff,0 (ssd16):
Jun 6 19:17:49 oerpsv01 unix: SCSI transport failed: reason 'timeout':retrying command
Jun 6 19:19:14 oerpsv01 unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370edd68,0 (ssd12):
Jun 6 19:19:14 oerpsv01 unix: SCSI transport failed: reason 'timeout':retrying command
Jun 6 19:20:44 oerpsv01 unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370edd69,0 (ssd13):
Jun 6 19:20:44 oerpsv01 unix: SCSI transport failed: reason 'timeout':retrying command                                    

This is a typical error message of an A5x00 array. You can see that the machine is going offline and online.

If the errors are are on more than one disk. The first thing an engineer should check is the A5x00 patch matrix for latest firmware of the array and disks.

If you see errors like this:

Aug 6 07:49:59 bureau3 unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2):
Aug 6 07:49:59 bureau3 unix: Error for Command: write(10) ErrorLevel: Fatal
Aug 6 07:49:59 bureau3 unix: Requested Block: 6429986 ErrorBlock: 6429986
Aug 6 07:49:59 bureau3 unix: Vendor: SEAGATE Serial Number: NG031399
Aug 6 07:49:59 bureau3 unix: Sense Key: Not Ready
Aug 6 07:49:59 bureau3 unix: ASC: 0x4 (<vendor unique code 0x4>), ASCQ: 0x1,FRU: 0x2
Aug 6 07:49:59 bureau3 unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2):
Aug 6 07:49:59 bureau3 unix: Error for Command: write ErrorLevel: Fatal                                    

*This will tell you that it is a disk at sd@2,0 (sd is for scsi disk) at target 2 controller 0 (onboard).

For A1000 and D1000 SCSI errors:

Oct 7 16:30:00 uasympatico unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd16):
Oct 7 16:30:00 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command
Oct 7 16:30:00 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command
Oct 7 16:30:01 uasympatico unix:
Oct 7 16:30:01 uasympatico unix:
Oct 7 16:30:11 uasympatico unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd12):
Oct 7 16:30:11 uasympatico unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd18):
Oct 7 16:30:11 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command
Oct 7 16:30:11 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command                                    

*This tells you that they are using a QLOGIC card differential card that is having the problem.

Showrev -p will tell you the rev of the installed patches. The system type will tell you if it pci, scsi, or fcal type. The OS will tell you what patches are available for that system. The prtdiag will give you a break down of the hardware.

INTERNAL SUMMARY:

Internal Summary

SUBMITTER: Karen Vergakes APPLIES TO: Hardware, Hardware/Ultra Enterprise/Servers/Enterprise 6500, Hardware/Ultra Enterprise/Servers/Enterprise 6000, Hardware/Ultra Enterprise/Servers/Enterprise 5500, Hardware/Ultra Enterprise/Servers/Enterprise 5000, Hardware/Ultra Enterprise/Servers/Enterprise 4500, Hardware/Ultra Enterprise/Servers/Enterprise 4000, Hardware/Ultra Enterprise/Servers/Enterprise 3500, Hardware/Ultra Enterprise/Servers/Enterprise 3000, Hardware/Ultra Enterprise/Servers/Enterprise 450, Hardware/Ultra Enterprise/Servers/Enterprise 250, Hardware/Ultra Workstations/Ultra 80, Hardware/Ultra Workstations/Ultra 60, Hardware/Ultra Workstations/Ultra 30, Hardware/Ultra Workstations/Ultra 10, Hardware/Ultra Workstations/Ultra 5, Hardware/Ultra Workstations/Ultra 2, Hardware/Ultra Workstations/Ultra 1 ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.