NOTE: this is happening on Centos 6 x86_64, 2.6.32-504.3.3.el6.x86_64 not Centos 5 Dell PowerEdge 2970, Seagate SATA drive, non-raid. I have this server which has been dying randomly, with no logs. I had a tail -f over ssh for a week, when this just happened. Feb 8 00:10:21 thirteen-230 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880057a0a080) Feb 8 00:10:21 thirteen-230 kernel: sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 1a 17 a1 6f 00 00 01 00 Feb 8 00:10:51 thirteen-230 kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Feb 8 00:10:51 thirteen-230 kernel: mptbase: ioc0: Initiating recovery Feb 8 00:11:13 thirteen-230 kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880057a0a080) Write failed: Connection reset by peer After reading https://access.redhat.com/solutions/108273, I am increasing the logging (shown below) but I am not confident about this wait and see approach. sysctl -w dev.scsi.logging_level=98367 I am also going to check smartctl output once I get onsite to power cycle the system. Other posts I have read, but I can not act on yet: * http://unix.stackexchange.com/questions/34173/mptscsih-ioc0-task-abort-success-rv-2002-causes-30-seconds-freezing * https://bugzilla.kernel.org/show_bug.cgi?id=18652 * https://bugzilla.redhat.com/show_bug.cgi?id=483424 * https://bugzilla.kernel.org/show_bug.cgi?id=42765 * http://sourceforge.net/p/smartmontools/mailman/message/23849184/ * http://kb.softescu.ro/category/hardware/dell/ -Jason -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.