InfoDoc ID | Synopsis | Date | ||
130 | Scsi Transport Errors and how to identify them | 8 Nov 2002 |
Status | Issued |
Description |
SCSI Transport Errors and how to identify them:
Note: This is just a quick reference on SCSI errors. SCSI errors can and will be more indepth than what we can go into here.
There are 4 tools that can help identify SCSI transport errors on a sun4u machine.
The /var/adm/messages will give you hints to what is going on.
If the SCSI errors look like this:
Aug 3 14:02:57 asta unix: WARNING: /pci@1f,4000/scsi@3 (glm0): Aug 3 14:02:57 asta unix: WARNING: /pci@1f,4000/scsi@3 (glm0): Aug 3 14:02:57 asta unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0): Aug 3 14:02:57 asta unix: SCSI transport failed: reason 'reset': retrying command
Then check the level of the glm patch they have. If it is much lower then the latest rev have them update the patch level.
If the messages look like this:
Mar 24 06:58:10 aurora unix: warning: /pci@6,4000/scsi@4 (glm4): Mar 24 06:58:10 aurora unix: SCSI bus DATA IN phase parity error Mar 24 06:58:10 aurora unix: warning: ID[SUNWpd.glm.parity_check.6010] Mar 24 06:58:10 aurora unix: warning: /pci@6,4000/scsi@4 (glm4): Mar 24 06:58:10 aurora unix: Target 0 reducing sync. transfer rate Mar 24 06:58:10 aurora unix: warning: ID[SUNWpd.glm.sync_wide_backoff.6014] Mar 24 06:58:10 aurora unix: warning: /pci@6,4000/scsi@4/sd@0,0 (sd60): Mar 24 06:58:10 aurora unix: SCSI transport failed: reason 'tran_err': retrying command
Check the termination and cables for bent pins. Usually with SCSI bus DATA phase parity errors it is the cable and/ or termination. Also check for patches for glm, and disk firmware.
If the errors looks like this:
Jun 6 19:16:34 oerpsv01 unix: ID[SUNWssa.socal.link.5010] socal1: port 0:Fibre Channel is OFFLINE Jun 6 19:16:34 oerpsv01 unix: ID[SUNWssa.socal.link.6010] socal1: port 0:Fibre Channel Loop is ONLINE Jun 6 19:17:49 oerpsv01 unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370ed7ff,0 (ssd16): Jun 6 19:17:49 oerpsv01 unix: SCSI transport failed: reason 'timeout':retrying command Jun 6 19:19:14 oerpsv01 unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370edd68,0 (ssd12): Jun 6 19:19:14 oerpsv01 unix: SCSI transport failed: reason 'timeout':retrying command Jun 6 19:20:44 oerpsv01 unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370edd69,0 (ssd13): Jun 6 19:20:44 oerpsv01 unix: SCSI transport failed: reason 'timeout':retrying command
This is a typical error message of an A5x00 array. You can see that the machine is going offline and online.
If the errors are are on more than one disk. The first thing an engineer should check is the A5x00 patch matrix for latest firmware of the array and disks.
If you see errors like this:
Aug 6 07:49:59 bureau3 unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2): Aug 6 07:49:59 bureau3 unix: Error for Command: write(10) ErrorLevel: Fatal Aug 6 07:49:59 bureau3 unix: Requested Block: 6429986 ErrorBlock: 6429986 Aug 6 07:49:59 bureau3 unix: Vendor: SEAGATE Serial Number: NG031399 Aug 6 07:49:59 bureau3 unix: Sense Key: Not Ready Aug 6 07:49:59 bureau3 unix: ASC: 0x4 (<vendor unique code 0x4>), ASCQ: 0x1,FRU: 0x2 Aug 6 07:49:59 bureau3 unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2): Aug 6 07:49:59 bureau3 unix: Error for Command: write ErrorLevel: Fatal
*This will tell you that it is a disk at sd@2,0 (sd is for scsi disk) at target 2 controller 0 (onboard).
For A1000 and D1000 SCSI errors:
Oct 7 16:30:00 uasympatico unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd16): Oct 7 16:30:00 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command Oct 7 16:30:00 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command Oct 7 16:30:01 uasympatico unix: Oct 7 16:30:01 uasympatico unix: Oct 7 16:30:11 uasympatico unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd12): Oct 7 16:30:11 uasympatico unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd18): Oct 7 16:30:11 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command Oct 7 16:30:11 uasympatico unix: SCSI transport failed: reason 'incomplete':retrying command
*This tells you that they are using a QLOGIC card differential card that is having the problem.
Showrev -p will tell you the rev of the installed patches. The system type will tell you if it pci, scsi, or fcal type. The OS will tell you what patches are available for that system. The prtdiag will give you a break down of the hardware.
INTERNAL SUMMARY:
SUBMITTER: Karen Vergakes APPLIES TO: Hardware, Hardware/Ultra Enterprise/Servers/Enterprise 6500, Hardware/Ultra Enterprise/Servers/Enterprise 6000, Hardware/Ultra Enterprise/Servers/Enterprise 5500, Hardware/Ultra Enterprise/Servers/Enterprise 5000, Hardware/Ultra Enterprise/Servers/Enterprise 4500, Hardware/Ultra Enterprise/Servers/Enterprise 4000, Hardware/Ultra Enterprise/Servers/Enterprise 3500, Hardware/Ultra Enterprise/Servers/Enterprise 3000, Hardware/Ultra Enterprise/Servers/Enterprise 450, Hardware/Ultra Enterprise/Servers/Enterprise 250, Hardware/Ultra Workstations/Ultra 80, Hardware/Ultra Workstations/Ultra 60, Hardware/Ultra Workstations/Ultra 30, Hardware/Ultra Workstations/Ultra 10, Hardware/Ultra Workstations/Ultra 5, Hardware/Ultra Workstations/Ultra 2, Hardware/Ultra Workstations/Ultra 1 ATTACHMENTS: