Re: [CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!

18 Feb 2015


      On Tue, Feb 17, 2015 at 7:34 PM, Jason Pyeron jpyeron@pdinc.us wrote:
...
...
-----Original Message-----
From: Chris Murphy
Sent: Tuesday, February 17, 2015 20:48
On Tue, Feb 17, 2015 at 7:54 AM, Jason Pyeron wrote:
...
...
I'd post the entire dmesg somewhere
http://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log
At least part of the problem happens before this log starts.
Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6)
Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6)
Feb 15 23:41:21 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8613 seconds.
Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6)
Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6)
Feb 16 02:04:55 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8735 seconds.
Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0
Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076
Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0
Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076
Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6)
Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6)
Feb 16 04:30:31 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 9224 seconds.
Doesn't seem related.
...
...
...
...
What do you get for
smartctl -x <dev>
http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log
OK no smart extended test has been done, but also no pending bad or
relocated sectors, and no phy event errors either. So the write (10)
error seems isolated but it's still really suspicious, so I'd start
replacing hardware.
Dell tech is enroute. New system board and disk controller.
I'm curious what they replace.
...
...
...
I have replaced the drive (and reinstalled) already, the
panics still happen once ever 30-40 hours.
The only thing that suggests it might not be hardware are all the kvm
related messages in the kp.
How so, each of the results I find say these are to be ignored.
Well I found two older kernel bugs similar to this that suggested the
problem stopped happening when running kvm with 1vcpu, and in another
case when the VM was rebuilt 32-bit instead of 64-bit. But my ability
to read kernel call traces is very limited, I really don't know what's
going on.
If it's a kernel bug though, you could maybe clobber it with a
substantially newer kernel. You might check out elrepo kernels. 2.6.32
is really old, granted the centos one you're running has a huge pile
of backports that makes it less "ancient" from a stability
perspective, but anything really new that's hard to backport likely
isn't in that kernel. While you're waiting for Dell you could try
either:
kernel-ml-3.18.6-1.el6.elrepo.x86_64.rpm
kernel-ml-3.19.0-1.el6.elrepo.x86_64.rpm
What's running in the VM?
-- 
Chris Murphy

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!