Re: [CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!

18 Feb 2015


      ...
-----Original Message-----
From: Chris Murphy
Sent: Tuesday, February 17, 2015 23:38
On Tue, Feb 17, 2015 at 7:34 PM, Jason Pyeron wrote:
...
...
-----Original Message-----
From: Chris Murphy
Sent: Tuesday, February 17, 2015 20:48
On Tue, Feb 17, 2015 at 7:54 AM, Jason Pyeron wrote:
...
...
I'd post the entire dmesg somewhere
http://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log
At least part of the problem happens before this log starts.
<snip/>
...
...
Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPREQUEST on
br0 to 192.168.5.58 port 67 (xid=0x48d081b6)
...
Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPACK from
192.168.5.58 (xid=0x48d081b6)
...
Feb 16 04:30:31 thirteen-230 dhclient[1272]: bound to
192.168.13.230 -- renewal in 9224 seconds.
Doesn't seem related.
...
...
...
...
What do you get for
smartctl -x <dev>
http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log
...
...
OK no smart extended test has been done, but also no pending bad or
relocated sectors, and no phy event errors either. So the
write (10)
...
...
error seems isolated but it's still really suspicious, so I'd start
replacing hardware.
Dell tech is enroute. New system board and disk controller.
I'm curious what they replace.
Both, but the backplane is not on the replacement list.
...
...
...
...
I have replaced the drive (and reinstalled) already, the
panics still happen once ever 30-40 hours.
The only thing that suggests it might not be hardware are
all the kvm
...
...
related messages in the kp.
How so, each of the results I find say these are to be ignored.
Well I found two older kernel bugs similar to this that suggested the
problem stopped happening when running kvm with 1vcpu, and in another
case when the VM was rebuilt 32-bit instead of 64-bit. But my ability
to read kernel call traces is very limited, I really don't know what's
going on.
I can say, we have about 20 of the identical systems, doing the same work. PE2970 running RHEL6/Centos6 and libvirtd
...
If it's a kernel bug though, you could maybe clobber it with a
substantially newer kernel. You might check out elrepo kernels. 2.6.32
is really old, granted the centos one you're running has a huge pile
of backports that makes it less "ancient" from a stability
We should start looking at Centos7/RHEL7, ug systemd..... But these machines are ancient too.
...
perspective, but anything really new that's hard to backport likely
isn't in that kernel. While you're waiting for Dell you could try
either:
kernel-ml-3.18.6-1.el6.elrepo.x86_64.rpm
kernel-ml-3.19.0-1.el6.elrepo.x86_64.rpm
Unlikly, since I do not have a test plan. If I could reproduce the error on demand then it would be a valid experiment. Some of the systems are running RHEL6 which are under support, while the others are Centos6. The configs are kept as close as possible to each other.
Besides I am doing the migration right now to another host.
...
What's running in the VM?
Mostly RHEL6/Centos6 VMs. But there are some windows systems too. This system was handling most of the CipherShed.org Jenkins CI farm. I can say the resources are oversubscribed by a 15x. But the system runs at below 0.10 at any random time.
Thanks for the thoughs on this.
-Jason
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-                                                               -
- Jason Pyeron                      PD Inc. http://www.pdinc.us -
- Principal Consultant              10 West 24th Street #100    -
- +1 (443) 269-1555 x333            Baltimore, Maryland 21218   -
-                                                               -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This message is copyright PD Inc, subject to license 20080407P00.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!