[CentOS-virt] KVM instance keep crashing
emsearcy at gmail.com
Thu Oct 14 13:27:44 EDT 2010
On Oct 14, 2010, at 1:38 AM, Poh Yong Hwang wrote:
> I have one KVM instance (centos 5) that keeps crashing and i see the message log with the following:
> Oct 14 16:24:48 localhost kernel: psmouse.c: Explorer Mouse at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away.
> Oct 14 16:24:49 localhost kernel: BUG: soft lockup - CPU#0 stuck for 12s! [ntpd:2363]
> Oct 14 16:24:49 localhost kernel: CPU 0:
> Oct 14 16:24:49 localhost kernel: Modules linked in: backupdriver(PU) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc talpa_pedevice(U) dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy virtio_balloon virtio_pci ide_cd i2c_piix4 virtio_ring 8139too cdrom 8139cp pcspkr i2c_core virtio mii serio_raw dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
> Oct 14 16:24:49 localhost kernel: Pid: 2363, comm: ntpd Tainted: P 2.6.18-194.3.1.el5 #1
> Afterwhich the instance become very sluggish and unresponsive. Please advise what could be the issue.
I'm no expert on kernel stuff, but I thought I'd throw in a couple suggested points of clarification on your request since the above is not clear to me.
Is the above in /var/log/message on the guest or host?
Is it always an "ntpd" process on the CPU#0 stuck/soft lockup line? Does the soft lockup always occur after a psmouse.c warning? (Even so, the psmouse.c warning could maybe be a symptom of the CPU being stuck, not the cause...)
What type of hardware is this? Noticing that is says "tainted" and I'm assuming this is the kernel (as I have no idea how a userland process, ntpd, could be "tainted"!), then you have a binary-distributed kernel module and you should probably try with that unloaded to see if the issue goes away. It could be a machine check error, but that's less likely I think. To double check, run the following in both the host and guest:
This ORed value can be checked against the flags given in http://www.kernel.org/doc/Documentation/sysctl/kernel.txt
More information about the CentOS-virt