On Tue, Feb 17, 2015 at 7:34 PM, Jason Pyeron <jpyeron at pdinc.us> wrote: >> -----Original Message----- >> From: Chris Murphy >> Sent: Tuesday, February 17, 2015 20:48 >> >> On Tue, Feb 17, 2015 at 7:54 AM, Jason Pyeron wrote: >> >> I'd post the entire dmesg somewhere >> > >> > http://client.pdinc.us/panic-341e97c30b5a4cb774942bae32d3f163.log >> >> At least part of the problem happens before this log starts. > > Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) > Feb 15 23:41:19 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) > Feb 15 23:41:21 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8613 seconds. > Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) > Feb 16 02:04:54 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) > Feb 16 02:04:55 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 8735 seconds. > Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0 > Feb 16 02:46:09 thirteen-230 kernel: kvm: 1994: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 > Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffffd8f0 > Feb 16 03:53:39 thirteen-230 kernel: kvm: 2161: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 > Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPREQUEST on br0 to 192.168.5.58 port 67 (xid=0x48d081b6) > Feb 16 04:30:30 thirteen-230 dhclient[1272]: DHCPACK from 192.168.5.58 (xid=0x48d081b6) > Feb 16 04:30:31 thirteen-230 dhclient[1272]: bound to 192.168.13.230 -- renewal in 9224 seconds. Doesn't seem related. > >> >> >> What do you get for >> >> smartctl -x <dev> >> > >> > http://client.pdinc.us/smartctl-2000e86b62db27169cc9307358ebf10e.log >> >> OK no smart extended test has been done, but also no pending bad or >> relocated sectors, and no phy event errors either. So the write (10) >> error seems isolated but it's still really suspicious, so I'd start >> replacing hardware. > > Dell tech is enroute. New system board and disk controller. I'm curious what they replace. > >> >> >> > I have replaced the drive (and reinstalled) already, the >> panics still happen once ever 30-40 hours. >> >> The only thing that suggests it might not be hardware are all the kvm >> related messages in the kp. > > How so, each of the results I find say these are to be ignored. Well I found two older kernel bugs similar to this that suggested the problem stopped happening when running kvm with 1vcpu, and in another case when the VM was rebuilt 32-bit instead of 64-bit. But my ability to read kernel call traces is very limited, I really don't know what's going on. If it's a kernel bug though, you could maybe clobber it with a substantially newer kernel. You might check out elrepo kernels. 2.6.32 is really old, granted the centos one you're running has a huge pile of backports that makes it less "ancient" from a stability perspective, but anything really new that's hard to backport likely isn't in that kernel. While you're waiting for Dell you could try either: kernel-ml-3.18.6-1.el6.elrepo.x86_64.rpm kernel-ml-3.19.0-1.el6.elrepo.x86_64.rpm What's running in the VM? -- Chris Murphy