[CentOS-devel] RE: [CentOS-virt] BUG: soft lockup detected onCPU#?

Wed Jan 23 13:41:56 UTC 2008
Ross S. W. Walker <rwalker at medallion.com>

Thanks Johnny, and sorry for the top post (blackberry).

I d/l'd the src rpm and found their patches all in 1 patch file called xen.patch (I did an ls -lt and picked the files with the latest timestamps). There may also be kernel config changes as several config files were touched, but I couldn't get a hold of the original 8.1.8 src rpm to diff them.

I would be happy to help in getting the parts needed so they can be rolled up into a single patch to apply to the current plus kernel. Just let me know what you need.

I wonder if anybody at XenSource notified upstream of the fixes?


----- Original Message -----
From: centos-devel-bounces at centos.org <centos-devel-bounces at centos.org>
To: The CentOS developers mailing list. <centos-devel at centos.org>
Sent: Wed Jan 23 07:37:04 2008
Subject: Re: [CentOS-devel] RE: [CentOS-virt] BUG: soft lockup detected onCPU#?

Ross S. W. Walker wrote:
> Ross S. W. Walker wrote:
>> Brett Worth wrote:
>>> Hello All.
>>> I've just started looking into Xen and have a test 
>>> environment in place.  I'm seeing an
>>> annoying problem that I thought worthy of a post.
>>> Config:
>>> I have 2 x HP DL585 servers each with 4 Dual core Opterons 
>>> (non-vmx) and 16GB RAM
>>> configured as Xen servers.  These run CentOS 5.1 with the 
>>> latest updates applied.  These
>>> system both attach to an iSCSI target which is an HP DL385 
>>> running ietd and serving SAN
>>> based storage.
>>> I have a test VM running CentOS 5.1 also updated.
>>> Problem:
>>> If I run the VM on a single server everything is OK.  If I do 
>>> a migrate of the VM to the
>>> other server I start getting random "BUG: soft lockup 
>>> detected on CPU#?" messages on the
>>> VM console.  The messages seem to happen with IO but not 
>>> every time.  A reboot of the VM
>>> on the new server will stop these messages.
>>> I've also left the VM running overnight a couple of times and 
>>> when I do I find that any
>>> external sessions (ssh) are hung in the morning but the 
>>> console session is not.  New ssh
>>> sessions can be started and seem to work.
>>> After much googling it looks like the kernel messages can 
>>> occur if dom0 is very busy but
>>> mine is not.
>>> Any suggestions?
>> The soft lockup is technically not a BUG.
>> You will see these errors if an IRQ takes more then 10 seconds
>> to respond.
>> In your case I would take a look at your iSCSI setup and the
>> time it takes to migrate the VM from one node to another along
>> with SCSI reserve/release setup on the iSCSI target.
>> I also have been using the Xen 3.2 RPMs off xen.org to CentOS
>> 5.1 which good results, the VM migration may run smoother and
>> quicker in Xen 3.2, but in doing so you take Xen off the
>> reservation, if your OK with that it may fix your issues.
> After seeing this same issue on my Xen 3.2 install, but with NO
> migration or iSCSI happening I decided it is probably NOT iSCSI's
> fault, so I decided to research it a little more and this is what
> I found:
> http://docs.xensource.com/XenServer/4.0.1/guest/ch04s08.html#rhel5_limitations
> XenSource does provide a repo of CentOS 5 kernels that have been
> patched to fix this though:
> http://updates.xensource.com/XenServer/4.0.1/centos5x/
> But these seem to be woefully out of date.
> I wonder if a kind soul would add the fix to the centosplus kernel
> with XenSource's patch so those rogue Xen users could benefit from
> this fix until upstream decides to include it.
> I suppose the centosplus patch would need to be flagged interm in
> case it needs removed when upstream has their own fix.


Thanks for researching this.

I can probably add this to the next centosplus kernels, though I usually 
do not like to add patches ... and I will need to grab their kernels and 
work out what is patched and try to roll it into our kernels.

-- Johnny Hughes

This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos-devel/attachments/20080123/bc808ef0/attachment-0005.html>