[CentOS-virt] CentOS 6.3 kvm-qemu problem with Debian guest

Zoltan Frombach zoltan at frombach.com
Wed Oct 24 17:47:05 EDT 2012


Hello,

I have CentOS 6.3 installed on a server with dual Xeon CPU's.

Motherboard info:
http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRT-HF.cfm

CPU info (we have two of these):
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
http://ark.intel.com/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI

Generally kvm-qemu works. But, my Debian guest OS (Debian "squeeze" with 
2.6.32 Linux kernel) won't boot up if I assign more than 4 vcpus to the 
guest. What happens is, the kernel starts to load then it freezes right 
where it should load the Linux agpgart module. It just hangs there. I 
can hit enter in the console and the cursor moves down but that's all 
what I can do besides force-shutoff the guest vm.

If I change the VM's config to have just 4 vcpus then it boots up most 
of the time! Sometimes it hangs with just 4 vcpus, too. When it hangs, 
it always hangs at the same point, so it is consistent. Also, changing 
the amount of RAM of the VM also have an effect: with 6GB or less RAM 
assigned to the VM it most likely boots up fine. With 8GB RAM, however, 
it will boot up about 50% of the time and it hangs the other 50% of the 
time. My Debian guest VM *never* booted up with 6 or more vcpus. Agaim, 
with 4 vcpus it may or may not boot up but if I add 8GB RAM regardless 
of the vcpu count it will more likely hang at bootup. I don't change 
anything else between these tries just the number of vcpus or the amount 
of RAM.

I have tried to use a newer Debian kernel installed from Debian 
backports (3.2.x.x Linux kernel), but it did not help. What happens is 
exactly the same thing. I also have other Debian VM images (some of the 
raw image file based some of them LVM block device based) and I can 
reproduce the problem with all of them.

My problem somewhat resembles this one:
http://forum.parallels.com/showthread.php?t=4882
except that I'm not using parallels at all, I am trying to use kvm-qemu 
under CentOS 6.3
But this part seems to apply to my case: "I did experience some relief 
... by changing the amount of available RAM, but that has no longer 
prevented the intel-agp kernel panic recently."

I've tried to blacklist the agpgart module inside my guest OS to no 
avail. Here is what I've added to the end of 
/etc/modprobe.d/blacklist.conf :
blacklist agpgart
blacklist intel-agp

However, when the VM happens to boot up then I see this entry in dmesg 
(snippet):

Oct 25 23:27:01 infoglobal kernel: [    0.864798] pci_hotplug: PCI Hot 
Plug PCI Core version: 0.5
Oct 25 23:27:01 infoglobal kernel: [    0.865650] pciehp: PCI Express 
Hot Plug Controller Driver version: 0.4
Oct 25 23:27:01 infoglobal kernel: [    0.866538] acpiphp: ACPI Hot Plug 
PCI Controller Driver version: 0.5
Oct 25 23:27:01 infoglobal kernel: [    0.867484] acpiphp: Slot [1] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.868250] acpiphp: Slot [2] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.868995] acpiphp: Slot [3] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.869750] acpiphp: Slot [4] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.870489] acpiphp: Slot [5] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.871233] acpiphp: Slot [6] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.871959] acpiphp: Slot [7] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.872735] acpiphp: Slot [8] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.873472] acpiphp: Slot [9] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.874222] acpiphp: Slot [10] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.874958] acpiphp: Slot [11] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.875705] acpiphp: Slot [12] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.876494] acpiphp: Slot [13] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.877237] acpiphp: Slot [14] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.877987] acpiphp: Slot [15] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.878749] acpiphp: Slot [16] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.879517] acpiphp: Slot [17] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.880299] acpiphp: Slot [18] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.881042] acpiphp: Slot [19] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.881791] acpiphp: Slot [20] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.882546] acpiphp: Slot [21] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.883292] acpiphp: Slot [22] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.884045] acpiphp: Slot [23] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.884796] acpiphp: Slot [24] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.885545] acpiphp: Slot [25] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.886280] acpiphp: Slot [26] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.887005] acpiphp: Slot [27] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.887758] acpiphp: Slot [28] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.888539] acpiphp: Slot [29] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.889300] acpiphp: Slot [30] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.890037] acpiphp: Slot [31] 
registered
Oct 25 23:27:01 infoglobal kernel: [    0.891857] ERST: Table is not found!
Oct 25 23:27:01 infoglobal kernel: [    0.892590] GHES: HEST is not enabled!
Oct 25 23:27:01 infoglobal kernel: [    0.893438] Serial: 8250/16550 
driver, 4 ports, IRQ sharing enabled
Oct 25 23:27:01 infoglobal kernel: [    0.916078] serial8250: ttyS0 at 
I/O 0x3f8 (irq = 4) is a 16550A
Oct 25 23:27:01 infoglobal kernel: [    1.013996] 00:05: ttyS0 at I/O 
0x3f8 (irq = 4) is a 16550A
Oct 25 23:27:01 infoglobal kernel: [    1.016342] Linux agpgart 
interface v0.103

Please notice the last line above: Linux agpgart interface v0.103

When the VM hangs during boot then the last line what is printed to the 
console is the line just above:
Oct 25 23:27:01 infoglobal kernel: [    1.013996] 00:05: ttyS0 at I/O 
0x3f8 (irq = 4) is a 16550A

This is why I'm suspecting that this problem have something to do with 
agpgart ...

Anyway, here is some info about the host system:

# virsh capabilities
<capabilities>

   <host>
     <uuid>00020003-0004-0005-0006-000700080009</uuid>
     <cpu>
       <arch>x86_64</arch>
       <model>SandyBridge</model>
       <vendor>Intel</vendor>
       <topology sockets='1' cores='6' threads='2'/>
       <feature name='pdpe1gb'/>
       <feature name='osxsave'/>
       <feature name='tsc-deadline'/>
       <feature name='dca'/>
       <feature name='pdcm'/>
       <feature name='xtpr'/>
       <feature name='tm2'/>
       <feature name='est'/>
       <feature name='smx'/>
       <feature name='vmx'/>
       <feature name='ds_cpl'/>
       <feature name='monitor'/>
       <feature name='dtes64'/>
       <feature name='pbe'/>
       <feature name='tm'/>
       <feature name='ht'/>
       <feature name='ss'/>
       <feature name='acpi'/>
       <feature name='ds'/>
       <feature name='vme'/>
     </cpu>
     <power_management>
       <suspend_disk/>
     </power_management>
     <migration_features>
       <live/>
       <uri_transports>
         <uri_transport>tcp</uri_transport>
       </uri_transports>
     </migration_features>
     <topology>
       <cells num='2'>
         <cell id='0'>
           <cpus num='12'>
             <cpu id='0'/>
             <cpu id='1'/>
             <cpu id='2'/>
             <cpu id='3'/>
             <cpu id='4'/>
             <cpu id='5'/>
             <cpu id='12'/>
             <cpu id='13'/>
             <cpu id='14'/>
             <cpu id='15'/>
             <cpu id='16'/>
             <cpu id='17'/>
           </cpus>
         </cell>
         <cell id='1'>
           <cpus num='12'>
             <cpu id='6'/>
             <cpu id='7'/>
             <cpu id='8'/>
             <cpu id='9'/>
             <cpu id='10'/>
             <cpu id='11'/>
             <cpu id='18'/>
             <cpu id='19'/>
             <cpu id='20'/>
             <cpu id='21'/>
             <cpu id='22'/>
             <cpu id='23'/>
           </cpus>
         </cell>
       </cells>
     </topology>
   </host>

   <guest>
     <os_type>hvm</os_type>
     <arch name='i686'>
       <wordsize>32</wordsize>
       <emulator>/usr/libexec/qemu-kvm</emulator>
       <machine>rhel6.3.0</machine>
       <machine canonical='rhel6.3.0'>pc</machine>
       <machine>rhel6.2.0</machine>
       <machine>rhel6.1.0</machine>
       <machine>rhel6.0.0</machine>
       <machine>rhel5.5.0</machine>
       <machine>rhel5.4.4</machine>
       <machine>rhel5.4.0</machine>
       <domain type='qemu'>
       </domain>
       <domain type='kvm'>
         <emulator>/usr/libexec/qemu-kvm</emulator>
       </domain>
     </arch>
     <features>
       <cpuselection/>
       <deviceboot/>
       <pae/>
       <nonpae/>
       <acpi default='on' toggle='yes'/>
       <apic default='on' toggle='no'/>
     </features>
   </guest>

   <guest>
     <os_type>hvm</os_type>
     <arch name='x86_64'>
       <wordsize>64</wordsize>
       <emulator>/usr/libexec/qemu-kvm</emulator>
       <machine>rhel6.3.0</machine>
       <machine canonical='rhel6.3.0'>pc</machine>
       <machine>rhel6.2.0</machine>
       <machine>rhel6.1.0</machine>
       <machine>rhel6.0.0</machine>
       <machine>rhel5.5.0</machine>
       <machine>rhel5.4.4</machine>
       <machine>rhel5.4.0</machine>
       <domain type='qemu'>
       </domain>
       <domain type='kvm'>
         <emulator>/usr/libexec/qemu-kvm</emulator>
       </domain>
     </arch>
     <features>
       <cpuselection/>
       <deviceboot/>
       <acpi default='on' toggle='yes'/>
       <apic default='on' toggle='no'/>
     </features>
   </guest>

</capabilities>

And the entire XML config file of the Debian VM:

# cat Debian-Hosting.xml
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made 
using:
   virsh edit Debian-Hosting
or other application using the libvirt API.
-->

<domain type='kvm'>
   <name>Debian-Hosting</name>
   <uuid>b641a879-35c4-f5c5-9b9d-1a76aa843389</uuid>
   <memory unit='KiB'>8388608</memory>
   <currentMemory unit='KiB'>8388608</currentMemory>
   <vcpu placement='static' cpuset='6-7,18-19'>4</vcpu>
   <os>
     <type arch='x86_64' machine='rhel6.3.0'>hvm</type>
     <boot dev='hd'/>
     <bootmenu enable='no'/>
   </os>
   <features>
     <acpi/>
     <apic/>
     <pae/>
   </features>
   <cpu mode='custom' match='exact'>
     <model fallback='allow'>SandyBridge</model>
     <vendor>Intel</vendor>
     <feature policy='require' name='vme'/>
     <feature policy='require' name='tm2'/>
     <feature policy='require' name='est'/>
     <feature policy='require' name='vmx'/>
     <feature policy='require' name='osxsave'/>
     <feature policy='require' name='smx'/>
     <feature policy='require' name='ss'/>
     <feature policy='require' name='ds'/>
     <feature policy='require' name='tsc-deadline'/>
     <feature policy='require' name='dtes64'/>
     <feature policy='require' name='ht'/>
     <feature policy='require' name='dca'/>
     <feature policy='require' name='pbe'/>
     <feature policy='require' name='tm'/>
     <feature policy='require' name='pdcm'/>
     <feature policy='require' name='pdpe1gb'/>
     <feature policy='require' name='ds_cpl'/>
     <feature policy='require' name='xtpr'/>
     <feature policy='require' name='acpi'/>
     <feature policy='require' name='monitor'/>
   </cpu>
   <clock offset='utc'/>
   <on_poweroff>destroy</on_poweroff>
   <on_reboot>restart</on_reboot>
   <on_crash>restart</on_crash>
   <devices>
     <emulator>/usr/libexec/qemu-kvm</emulator>
     <disk type='block' device='disk'>
       <driver name='qemu' type='raw' cache='none'/>
       <source dev='/dev/mapper/vg0_host-lv0_3_debianhost'/>
       <target dev='vda' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
     </disk>
     <controller type='ide' index='0'>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
function='0x1'/>
     </controller>
     <controller type='usb' index='0'>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
function='0x2'/>
     </controller>
     <interface type='bridge'>
       <mac address='00:16:36:e2:20:ea'/>
       <source bridge='br0'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
     </interface>
     <serial type='pty'>
       <target port='0'/>
     </serial>
     <console type='pty'>
       <target type='serial' port='0'/>
     </console>
     <input type='mouse' bus='ps2'/>
     <graphics type='vnc' port='-1' autoport='yes'/>
     <video>
       <model type='cirrus' vram='9216' heads='1'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' 
function='0x0'/>
     </video>
     <memballoon model='virtio'>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
     </memballoon>
   </devices>
</domain>

As configured above, with 8GB RAM and just 4 vcpus it just booted up - 
on the second attempt... If I would change the amount of RAM to 6GB it 
would most likely boot up almost all the time. But if I would add 2 or 4 
more vcpus then it would never boot up, for sure. It always stuck at the 
same point, right there when the "Linux agpgart interface v0.103" should 
be displayed by the kernel.

Any help with this would be highly appreciated!

Thank you,

Zoltan


More information about the CentOS-virt mailing list