[CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!

Sun Feb 8 03:53:51 UTC 2015
Jason Pyeron <jpyeron at pdinc.us>

NOTE: this is happening on Centos 6 x86_64, 2.6.32-504.3.3.el6.x86_64 not Centos 5

Dell PowerEdge 2970, Seagate SATA drive, non-raid.

I have this server which has been dying randomly, with no logs.

I had a tail -f over ssh for a week, when this just happened.

Feb  8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880057a0a080)
Feb  8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 1a 17 a1 6f 00 00 01 00
Feb  8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000
Feb  8 00:10:51 thirteen-230 kernel: mptbase: ioc0: Initiating recovery
Feb  8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880057a0a080)
Write failed: Connection reset by peer

After reading https://access.redhat.com/solutions/108273, I am increasing the logging (shown below) but I am not confident about this wait and see approach.

sysctl -w dev.scsi.logging_level=98367

I am also going to check smartctl output once I get onsite to power cycle the system.

Other posts I have read, but I can not act on yet:

* http://unix.stackexchange.com/questions/34173/mptscsih-ioc0-task-abort-success-rv-2002-causes-30-seconds-freezing
* https://bugzilla.kernel.org/show_bug.cgi?id=18652
* https://bugzilla.redhat.com/show_bug.cgi?id=483424
* https://bugzilla.kernel.org/show_bug.cgi?id=42765
* http://sourceforge.net/p/smartmontools/mailman/message/23849184/
* http://kb.softescu.ro/category/hardware/dell/

-Jason

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-                                                               -
- Jason Pyeron                      PD Inc. http://www.pdinc.us -
- Principal Consultant              10 West 24th Street #100    -
- +1 (443) 269-1555 x333            Baltimore, Maryland 21218   -
-                                                               -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00.