[CentOS-virt] Fedora 12 domU will not boot kernel

Sun Feb 21 23:10:35 UTC 2010
Norman Gaywood <ngaywood at une.edu.au>

On Sat, Feb 20, 2010 at 05:09:31PM +0200, Pasi Kärkkäinen wrote:
> On Mon, Feb 01, 2010 at 10:20:26AM +0200, Pasi Kärkkäinen wrote:
> > On Sat, Jan 30, 2010 at 10:57:02AM +1100, Norman Gaywood wrote:
> > > On Sat, Jan 09, 2010 at 03:45:15PM +0200, Pasi Kärkkäinen wrote:
> > 
> > > I'm running 2.6.18-164.11.1.el5xen and the fedora kernels for F11 and F12
> > > are not very stable for me.  Dom0 is a Centos 5.4 2.6.18-164.9.1.el5xen
> > > kernel.
> > > 
> > > These fedora kernels seem to lockup (processes get stuck in D state)
> > > whenever put under any load:
> > > 
> > > kernel-2.6.30.10-105.fc11.x86_64
> > > kernel-2.6.31.6-166.fc12.x86_64
> > > kernel-2.6.31.9-174.fc12.x86_64

> > Do you have some easy-to-reproduce test/script so I could try it on my system?

No unfortuately. It's very difficult to pin down the workload that causes
this. The test systems I've tried don't seem to trigger it. The crash
does not even seem to be related to a heavy load. We will have several
LTSP users and a little bit of disk activity and suddenly it will trip
into the D state lockup. This could be after many hours (up to 2 days)
of running, or in our worst case 2 minutes.

> > > Here are some bugzilla entries (one posted by me) that point to this
> > > problem:
> > > 
> > > kernel 2.6.31 processes lock up in D state
> > > https://bugzilla.redhat.com/show_bug.cgi?id=550724
> > > 
> > > FC12 2.6.31.9-174.fc12.x86_64 hangs under heavy disk I/O
> > > https://bugzilla.redhat.com/show_bug.cgi?id=551552

And this one as well seems to be the same issue if you look at the end
of the report:

F12 Xen DomU unstable (2.6.31)
https://bugzilla.redhat.com/show_bug.cgi?id=526627

> I haven't seen these D state problems, but I found this CONFIG_HIGHPTE xen_set_pte() bug/race
> that causes 32bit PAE guest crashes:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=566932
> 
> Were your guests 32bit? 

No, they are 64bit.

If I google around, I can find people which look like they have had
similar issues.  Most threads die out with no resolution. Here is one
that looks similar on the kernel mailing list:

http://www.mail-archive.com/kvm@vger.kernel.org/msg23039.html


-- 
Norman Gaywood, Computer Systems Officer
University of New England, Armidale, NSW 2351, Australia

ngaywood at une.edu.au            Phone: +61 (0)2 6773 3337
http://mcs.une.edu.au/~norm    Fax:   +61 (0)2 6773 3312

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html