Hello,
I have CentOS 6.3 installed on a server with dual Xeon CPU's.
Motherboard info: http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRT-HF.cfm
CPU info (we have two of these): Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz http://ark.intel.com/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2...
Generally kvm-qemu works. But, my Debian guest OS (Debian "squeeze" with 2.6.32 Linux kernel) won't boot up if I assign more than 4 vcpus to the guest. What happens is, the kernel starts to load then it freezes right where it should load the Linux agpgart module. It just hangs there. I can hit enter in the console and the cursor moves down but that's all what I can do besides force-shutoff the guest vm.
If I change the VM's config to have just 4 vcpus then it boots up most of the time! Sometimes it hangs with just 4 vcpus, too. When it hangs, it always hangs at the same point, so it is consistent. Also, changing the amount of RAM of the VM also have an effect: with 6GB or less RAM assigned to the VM it most likely boots up fine. With 8GB RAM, however, it will boot up about 50% of the time and it hangs the other 50% of the time. My Debian guest VM *never* booted up with 6 or more vcpus. Agaim, with 4 vcpus it may or may not boot up but if I add 8GB RAM regardless of the vcpu count it will more likely hang at bootup. I don't change anything else between these tries just the number of vcpus or the amount of RAM.
I have tried to use a newer Debian kernel installed from Debian backports (3.2.x.x Linux kernel), but it did not help. What happens is exactly the same thing. I also have other Debian VM images (some of the raw image file based some of them LVM block device based) and I can reproduce the problem with all of them.
My problem somewhat resembles this one: http://forum.parallels.com/showthread.php?t=4882 except that I'm not using parallels at all, I am trying to use kvm-qemu under CentOS 6.3 But this part seems to apply to my case: "I did experience some relief ... by changing the amount of available RAM, but that has no longer prevented the intel-agp kernel panic recently."
I've tried to blacklist the agpgart module inside my guest OS to no avail. Here is what I've added to the end of /etc/modprobe.d/blacklist.conf : blacklist agpgart blacklist intel-agp
However, when the VM happens to boot up then I see this entry in dmesg (snippet):
Oct 25 23:27:01 infoglobal kernel: [ 0.864798] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Oct 25 23:27:01 infoglobal kernel: [ 0.865650] pciehp: PCI Express Hot Plug Controller Driver version: 0.4 Oct 25 23:27:01 infoglobal kernel: [ 0.866538] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 Oct 25 23:27:01 infoglobal kernel: [ 0.867484] acpiphp: Slot [1] registered Oct 25 23:27:01 infoglobal kernel: [ 0.868250] acpiphp: Slot [2] registered Oct 25 23:27:01 infoglobal kernel: [ 0.868995] acpiphp: Slot [3] registered Oct 25 23:27:01 infoglobal kernel: [ 0.869750] acpiphp: Slot [4] registered Oct 25 23:27:01 infoglobal kernel: [ 0.870489] acpiphp: Slot [5] registered Oct 25 23:27:01 infoglobal kernel: [ 0.871233] acpiphp: Slot [6] registered Oct 25 23:27:01 infoglobal kernel: [ 0.871959] acpiphp: Slot [7] registered Oct 25 23:27:01 infoglobal kernel: [ 0.872735] acpiphp: Slot [8] registered Oct 25 23:27:01 infoglobal kernel: [ 0.873472] acpiphp: Slot [9] registered Oct 25 23:27:01 infoglobal kernel: [ 0.874222] acpiphp: Slot [10] registered Oct 25 23:27:01 infoglobal kernel: [ 0.874958] acpiphp: Slot [11] registered Oct 25 23:27:01 infoglobal kernel: [ 0.875705] acpiphp: Slot [12] registered Oct 25 23:27:01 infoglobal kernel: [ 0.876494] acpiphp: Slot [13] registered Oct 25 23:27:01 infoglobal kernel: [ 0.877237] acpiphp: Slot [14] registered Oct 25 23:27:01 infoglobal kernel: [ 0.877987] acpiphp: Slot [15] registered Oct 25 23:27:01 infoglobal kernel: [ 0.878749] acpiphp: Slot [16] registered Oct 25 23:27:01 infoglobal kernel: [ 0.879517] acpiphp: Slot [17] registered Oct 25 23:27:01 infoglobal kernel: [ 0.880299] acpiphp: Slot [18] registered Oct 25 23:27:01 infoglobal kernel: [ 0.881042] acpiphp: Slot [19] registered Oct 25 23:27:01 infoglobal kernel: [ 0.881791] acpiphp: Slot [20] registered Oct 25 23:27:01 infoglobal kernel: [ 0.882546] acpiphp: Slot [21] registered Oct 25 23:27:01 infoglobal kernel: [ 0.883292] acpiphp: Slot [22] registered Oct 25 23:27:01 infoglobal kernel: [ 0.884045] acpiphp: Slot [23] registered Oct 25 23:27:01 infoglobal kernel: [ 0.884796] acpiphp: Slot [24] registered Oct 25 23:27:01 infoglobal kernel: [ 0.885545] acpiphp: Slot [25] registered Oct 25 23:27:01 infoglobal kernel: [ 0.886280] acpiphp: Slot [26] registered Oct 25 23:27:01 infoglobal kernel: [ 0.887005] acpiphp: Slot [27] registered Oct 25 23:27:01 infoglobal kernel: [ 0.887758] acpiphp: Slot [28] registered Oct 25 23:27:01 infoglobal kernel: [ 0.888539] acpiphp: Slot [29] registered Oct 25 23:27:01 infoglobal kernel: [ 0.889300] acpiphp: Slot [30] registered Oct 25 23:27:01 infoglobal kernel: [ 0.890037] acpiphp: Slot [31] registered Oct 25 23:27:01 infoglobal kernel: [ 0.891857] ERST: Table is not found! Oct 25 23:27:01 infoglobal kernel: [ 0.892590] GHES: HEST is not enabled! Oct 25 23:27:01 infoglobal kernel: [ 0.893438] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled Oct 25 23:27:01 infoglobal kernel: [ 0.916078] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Oct 25 23:27:01 infoglobal kernel: [ 1.013996] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Oct 25 23:27:01 infoglobal kernel: [ 1.016342] Linux agpgart interface v0.103
Please notice the last line above: Linux agpgart interface v0.103
When the VM hangs during boot then the last line what is printed to the console is the line just above: Oct 25 23:27:01 infoglobal kernel: [ 1.013996] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
This is why I'm suspecting that this problem have something to do with agpgart ...
Anyway, here is some info about the host system:
# virsh capabilities <capabilities>
<host> <uuid>00020003-0004-0005-0006-000700080009</uuid> <cpu> <arch>x86_64</arch> <model>SandyBridge</model> <vendor>Intel</vendor> <topology sockets='1' cores='6' threads='2'/> <feature name='pdpe1gb'/> <feature name='osxsave'/> <feature name='tsc-deadline'/> <feature name='dca'/> <feature name='pdcm'/> <feature name='xtpr'/> <feature name='tm2'/> <feature name='est'/> <feature name='smx'/> <feature name='vmx'/> <feature name='ds_cpl'/> <feature name='monitor'/> <feature name='dtes64'/> <feature name='pbe'/> <feature name='tm'/> <feature name='ht'/> <feature name='ss'/> <feature name='acpi'/> <feature name='ds'/> <feature name='vme'/> </cpu> <power_management> <suspend_disk/> </power_management> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features> <topology> <cells num='2'> <cell id='0'> <cpus num='12'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/> <cpu id='4'/> <cpu id='5'/> <cpu id='12'/> <cpu id='13'/> <cpu id='14'/> <cpu id='15'/> <cpu id='16'/> <cpu id='17'/> </cpus> </cell> <cell id='1'> <cpus num='12'> <cpu id='6'/> <cpu id='7'/> <cpu id='8'/> <cpu id='9'/> <cpu id='10'/> <cpu id='11'/> <cpu id='18'/> <cpu id='19'/> <cpu id='20'/> <cpu id='21'/> <cpu id='22'/> <cpu id='23'/> </cpus> </cell> </cells> </topology> </host>
<guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine>rhel6.3.0</machine> <machine canonical='rhel6.3.0'>pc</machine> <machine>rhel6.2.0</machine> <machine>rhel6.1.0</machine> <machine>rhel6.0.0</machine> <machine>rhel5.5.0</machine> <machine>rhel5.4.4</machine> <machine>rhel5.4.0</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/libexec/qemu-kvm</emulator> </domain> </arch> <features> <cpuselection/> <deviceboot/> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest>
<guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine>rhel6.3.0</machine> <machine canonical='rhel6.3.0'>pc</machine> <machine>rhel6.2.0</machine> <machine>rhel6.1.0</machine> <machine>rhel6.0.0</machine> <machine>rhel5.5.0</machine> <machine>rhel5.4.4</machine> <machine>rhel5.4.0</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/libexec/qemu-kvm</emulator> </domain> </arch> <features> <cpuselection/> <deviceboot/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest>
</capabilities>
And the entire XML config file of the Debian VM:
# cat Debian-Hosting.xml <!-- WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: virsh edit Debian-Hosting or other application using the libvirt API. -->
<domain type='kvm'> <name>Debian-Hosting</name> <uuid>b641a879-35c4-f5c5-9b9d-1a76aa843389</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static' cpuset='6-7,18-19'>4</vcpu> <os> <type arch='x86_64' machine='rhel6.3.0'>hvm</type> <boot dev='hd'/> <bootmenu enable='no'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>SandyBridge</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='tm2'/> <feature policy='require' name='est'/> <feature policy='require' name='vmx'/> <feature policy='require' name='osxsave'/> <feature policy='require' name='smx'/> <feature policy='require' name='ss'/> <feature policy='require' name='ds'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='ht'/> <feature policy='require' name='dca'/> <feature policy='require' name='pbe'/> <feature policy='require' name='tm'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='acpi'/> <feature policy='require' name='monitor'/> </cpu> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/mapper/vg0_host-lv0_3_debianhost'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <interface type='bridge'> <mac address='00:16:36:e2:20:ea'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> </devices> </domain>
As configured above, with 8GB RAM and just 4 vcpus it just booted up - on the second attempt... If I would change the amount of RAM to 6GB it would most likely boot up almost all the time. But if I would add 2 or 4 more vcpus then it would never boot up, for sure. It always stuck at the same point, right there when the "Linux agpgart interface v0.103" should be displayed by the kernel.
Any help with this would be highly appreciated!
Thank you,
Zoltan