On Tue, Feb 17, 2015 at 7:34 PM, Jason Pyeron jpyeron@pdinc.us wrote:
-----Original Message----- From: Chris Murphy Sent: Tuesday, February 17, 2015 20:48
On Tue, Feb 17, 2015 at 7:54 AM, Jason Pyeron wrote:
I'd post the entire dmesg somewhere
http://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log
At least part of the problem happens before this log starts.
Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) Feb 15 23:41:21 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8613 seconds. Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) Feb 16 02:04:55 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8735 seconds. Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0 Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0 Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) Feb 16 04:30:31 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 9224 seconds.
Doesn't seem related.
What do you get for smartctl -x <dev>
http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log
OK no smart extended test has been done, but also no pending bad or relocated sectors, and no phy event errors either. So the write (10) error seems isolated but it's still really suspicious, so I'd start replacing hardware.
Dell tech is enroute. New system board and disk controller.
I'm curious what they replace.
I have replaced the drive (and reinstalled) already, the
panics still happen once ever 30-40 hours.
The only thing that suggests it might not be hardware are all the kvm related messages in the kp.
How so, each of the results I find say these are to be ignored.
Well I found two older kernel bugs similar to this that suggested the problem stopped happening when running kvm with 1vcpu, and in another case when the VM was rebuilt 32-bit instead of 64-bit. But my ability to read kernel call traces is very limited, I really don't know what's going on.
If it's a kernel bug though, you could maybe clobber it with a substantially newer kernel. You might check out elrepo kernels. 2.6.32 is really old, granted the centos one you're running has a huge pile of backports that makes it less "ancient" from a stability perspective, but anything really new that's hard to backport likely isn't in that kernel. While you're waiting for Dell you could try either:
kernel-ml-3.18.6-1.el6.elrepo.x86_64.rpm kernel-ml-3.19.0-1.el6.elrepo.x86_64.rpm
What's running in the VM?