[CentOS-virt] BUG: soft lockup detected on CPU#?

Mon Jan 21 19:11:19 UTC 2008
Eli Stair <estair at ilm.com>

My un-authoritative answer:  I've been tracking this bug (or several with the 
same symptoms) for going on a couple years.  It's ridiculously common, 
apparently well known to the Xen/Xensource guys judging by the number of 
reports/bugs posted, but I haven't seen mention of it actually being addressed 
and resolved.  Unfortunately I see the same issue with it cropping up after VM 
moves, though it occurs /every/ time there is a VM migration, once per 
processor in the VM; doesn't matter if there is any IO on the Dom0 or DomU. 
Occasionally VM's die during a migration and have to be manually 
destroyed/restarted.

I do see evidence of significant instability (not implying it is related to the 
above softlockup issues) however, in either VM moves migrating from a Xeon 
(5345) to Opteron Dom0, and in high-utilization DomU's which are just plain 
flaky and reboot/die semi-frequently even when never altered from their start Dom0.

For me, it currently means running only low-priority non-production services in 
a VM, and not shelling out for RHEL5 support for the project (contrary to what 
I planned) since it's not being addressed.  I'd be curious if this is being 
addressed in the Xen 3.2 release for RHEL5*...

Cheers,

/eli



Brett Worth wrote:
> Hello All.
> 
> I've just started looking into Xen and have a test environment in 
> place.  I'm seeing an
> annoying problem that I thought worthy of a post.
> 
> Config:
> 
> I have 2 x HP DL585 servers each with 4 Dual core Opterons (non-vmx) and 
> 16GB RAM
> configured as Xen servers.  These run CentOS 5.1 with the latest updates 
> applied.  These
> system both attach to an iSCSI target which is an HP DL385 running ietd 
> and serving SAN
> based storage.
> 
> I have a test VM running CentOS 5.1 also updated.
> 
> Problem:
> 
> If I run the VM on a single server everything is OK.  If I do a migrate 
> of the VM to the
> other server I start getting random "BUG: soft lockup detected on CPU#?" 
> messages on the
> VM console.  The messages seem to happen with IO but not every time.  A 
> reboot of the VM
> on the new server will stop these messages.
> 
> I've also left the VM running overnight a couple of times and when I do 
> I find that any
> external sessions (ssh) are hung in the morning but the console session 
> is not.  New ssh
> sessions can be started and seem to work.
> 
> After much googling it looks like the kernel messages can occur if dom0 
> is very busy but
> mine is not.
> 
> Any suggestions?
> 
> Regards
> 
> Brett Worth
> 
> _______________________________________________
> CentOS-virt mailing list
> CentOS-virt at centos.org
> http://lists.centos.org/mailman/listinfo/centos-virt
>