Ross S. W. Walker wrote: > > Brett Worth wrote: > > > > Hello All. > > > > I've just started looking into Xen and have a test > > environment in place. I'm seeing an > > annoying problem that I thought worthy of a post. > > > > Config: > > > > I have 2 x HP DL585 servers each with 4 Dual core Opterons > > (non-vmx) and 16GB RAM > > configured as Xen servers. These run CentOS 5.1 with the > > latest updates applied. These > > system both attach to an iSCSI target which is an HP DL385 > > running ietd and serving SAN > > based storage. > > > > I have a test VM running CentOS 5.1 also updated. > > > > Problem: > > > > If I run the VM on a single server everything is OK. If I do > > a migrate of the VM to the > > other server I start getting random "BUG: soft lockup > > detected on CPU#?" messages on the > > VM console. The messages seem to happen with IO but not > > every time. A reboot of the VM > > on the new server will stop these messages. > > > > I've also left the VM running overnight a couple of times and > > when I do I find that any > > external sessions (ssh) are hung in the morning but the > > console session is not. New ssh > > sessions can be started and seem to work. > > > > After much googling it looks like the kernel messages can > > occur if dom0 is very busy but > > mine is not. > > > > Any suggestions? > > The soft lockup is technically not a BUG. > > You will see these errors if an IRQ takes more then 10 seconds > to respond. > > In your case I would take a look at your iSCSI setup and the > time it takes to migrate the VM from one node to another along > with SCSI reserve/release setup on the iSCSI target. > > I also have been using the Xen 3.2 RPMs off xen.org to CentOS > 5.1 which good results, the VM migration may run smoother and > quicker in Xen 3.2, but in doing so you take Xen off the > reservation, if your OK with that it may fix your issues. After seeing this same issue on my Xen 3.2 install, but with NO migration or iSCSI happening I decided it is probably NOT iSCSI's fault, so I decided to research it a little more and this is what I found: http://docs.xensource.com/XenServer/4.0.1/guest/ch04s08.html#rhel5_limitations XenSource does provide a repo of CentOS 5 kernels that have been patched to fix this though: http://updates.xensource.com/XenServer/4.0.1/centos5x/ But these seem to be woefully out of date. I wonder if a kind soul would add the fix to the centosplus kernel with XenSource's patch so those rogue Xen users could benefit from this fix until upstream decides to include it. I suppose the centosplus patch would need to be flagged interm in case it needs removed when upstream has their own fix. -Ross ______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.