I would try to put dmesg on Cron running once per hour and attempt to analyze logs captured. ________________________________ From: centos-virt-bounces at centos.org <centos-virt-bounces at centos.org> on behalf of Laurentiu Soica <laurentiu at soica.ro> Sent: Monday, August 15, 2016 2:15 AM To: Discussion about the virtualization on CentOS Subject: Re: [CentOS-virt] Nested KVM issue The CPUs are IvyBridge microarchitecture, Xeon E5-2670 v2. I could try lowering the vCPUs but I doubt it would help. Please note that the same VMs are running just fine (with a load of 2 out of an acceptable 36 on the compute node) for about 3 days after a restart. În lun., 15 aug. 2016 la 08:42, Boris Derzhavets <bderzhavets at hotmail.com<mailto:bderzhavets at hotmail.com>> a scris: I would attempt to decrease number of VCPUS allocated to cloud VMs. Say try 4 => 2 . My guess there is not enough VCPUs to run OS itself. I also guess CPU model << Haswell. Please , confirm ( or not) if possible. Since Haswell was launched via my experience Intel Xeons based on this kernel (or latter kernels ) behaves much better then SandyBridge or IvyBridge based. Boris. ________________________________ From: centos-virt-bounces at centos.org<mailto:centos-virt-bounces at centos.org> <centos-virt-bounces at centos.org<mailto:centos-virt-bounces at centos.org>> on behalf of Laurentiu Soica <laurentiu at soica.ro<mailto:laurentiu at soica.ro>> Sent: Monday, August 15, 2016 1:15 AM To: Discussion about the virtualization on CentOS Subject: Re: [CentOS-virt] Nested KVM issue Hello Borins, 1. So, in about three days after a reboot (this happened several times already) the compute node reports high CPU usage. It has 36 vCPUs and it reports a load higher than 40. Usually the load is about 2 or 3. The VMs qemu-kvm processes reports 100% CPU usage (for a VM with 4 CPU it reports almost 400%, for one with 1 CPU it reports almost 100%). The VMs are not accessible anymore through SSH. 2. The baremetal has 2 CPUs, each with 10 cores and HT activated so it reports 40 CPUs. It has 128 GB RAM out of which 100 GB are for the compute node. I have 15 VMs running inside compute. They are summing up 40 vCPUs and 92 GB RAM. There are no swap devices installed on the compute node so the reported SwapTotal is 0 KB. I'll check is the memory on the compute gets exhausted as soon as the problem reproduces again (in about 2 days) but for now there are more than 80 GB available. Note that a reboot of the compute node doesn't fix the problem. Only a shutdown of the compute and a virsh start on it works. Thanks, Laurentiu În dum., 14 aug. 2016 la 23:27, Boris Derzhavets <bderzhavets at hotmail.com<mailto:bderzhavets at hotmail.com>> a scris: Reports posted look good for me. Config should provide the best available performance for cloud VM (L2) on Compute Node. 1. Please, remind me what goes wrong from your standpoint ? 2. Which CPU is installed on Compute Node && how much RAM ? Actually , my concern is :- Number_of_ Cloud_VMs versus Number_CPU_Cores ( not threads) Please, check `top` report in regards of swap area size. Thanks. Boris. ________________________________ From: centos-virt-bounces at centos.org<mailto:centos-virt-bounces at centos.org> <centos-virt-bounces at centos.org<mailto:centos-virt-bounces at centos.org>> on behalf of Laurentiu Soica <laurentiu at soica.ro<mailto:laurentiu at soica.ro>> Sent: Sunday, August 14, 2016 3:06 PM To: Discussion about the virtualization on CentOS Subject: Re: [CentOS-virt] Nested KVM issue Hello, 1. <domain type='kvm' id='6'> <name>baremetalbrbm_1</name> <uuid>534e9b54-5e4c-4acb-adcf-793f841551a7</uuid> <memory unit='KiB'>104857600</memory> <currentMemory unit='KiB'>104857600</currentMemory> <vcpu placement='static'>36</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type> <boot dev='hd'/> <bootmenu enable='no'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu mode='host-passthrough'/> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='unsafe'/> <source file='/var/lib/libvirt/images/baremetalbrbm_1.qcow2'/> <backingStore/> <target dev='sda' bus='sata'/> <alias name='sata0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <controller type='scsi' index='0' model='virtio-scsi'> <alias name='scsi0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <controller type='usb' index='0'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='sata' index='0'> <alias name='sata0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <interface type='bridge'> <mac address='00:f1:15:20:c5:46'/> <source network='brbm' bridge='brbm'/> <virtualport type='openvswitch'> <parameters interfaceid='654ad04f-fa0a-41dd-9d30-b84e702462fe'/> </virtualport> <target dev='vnet5'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='bridge'> <mac address='52:54:00:d3:c9:24'/> <source bridge='br57'/> <target dev='vnet6'/> <model type='rtl8139'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/3'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/3'> <source path='/dev/pts/3'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='5903' autoport='yes' listen='127.0.0.1'> <listen type='address' address='127.0.0.1'/> </graphics> <video> <model type='cirrus' vram='16384' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> 2. [root at overcloud-novacompute-0 ~]# lsmod | grep kvm kvm_intel 162153 70 kvm 525409 1 kvm_intel [root at overcloud-novacompute-0 ~]# cat /etc/nova/nova.conf | grep virt_type|grep -v '^#' virt_type=kvm [root at overcloud-novacompute-0 ~]# cat /etc/nova/nova.conf | grep cpu_mode|grep -v '^#' cpu_mode=host-passthrough Thanks, Laurentiu În dum., 14 aug. 2016 la 21:44, Boris Derzhavets <bderzhavets at hotmail.com<mailto:bderzhavets at hotmail.com>> a scris: ________________________________ From: centos-virt-bounces at centos.org<mailto:centos-virt-bounces at centos.org> <centos-virt-bounces at centos.org<mailto:centos-virt-bounces at centos.org>> on behalf of Laurentiu Soica <laurentiu at soica.ro<mailto:laurentiu at soica.ro>> Sent: Sunday, August 14, 2016 10:17 AM To: Discussion about the virtualization on CentOS Subject: Re: [CentOS-virt] Nested KVM issue More details on the subject: I suppose it is a nested KVM issue because it raised after I enabled the nested KVM feature. Without it, anyway, the second level VMs are unusable in terms of performance. I am using CentOS 7 with: kernel: 3.10.0-327.22.2.el7.x86_64 qemu-kvm:1.5.3-105.el7_2.4 libvirt:1.2.17-13.el7_2.5 on both the baremetal and the compute VM. Please, post 1) # virsh dumpxml VM-L1 ( where on L1 level you expect nested KVM to appear) 2) Login into VM-L1 and run :- # lsmod | grep kvm 3) I need outputs from VM-L1 ( in case it is Compute Node ) # cat /etc/nova/nova.conf | grep virt_type # cat /etc/nova/nova.conf | grep cpu_mode Boris. The only workaround now is to shutdown the compute VM and start it back from baremetal with virsh start. A simple restart of the compute node doesn't help. It looks like the qemu-kvm process corresponding to the compute VM is the problem. Laurentiu În dum., 14 aug. 2016 la 00:19, Laurentiu Soica <laurentiu at soica.ro<mailto:laurentiu at soica.ro>> a scris: Hello, I have an OpenStack setup in virtual environment on CentOS 7. The baremetal has nested KVM enabled and 1 compute node as a VM. Inside the compute node I have multiple VMs running. After about every 3 days the VMs get inaccessible and the compute node reports high CPU usage. The qemu-kvm process for each VM inside the compute node reports full CPU usage. Please help me with some hints to debug this issue. Thanks, Laurentiu _______________________________________________ CentOS-virt mailing list CentOS-virt at centos.org<mailto:CentOS-virt at centos.org> https://lists.centos.org/mailman/listinfo/centos-virt _______________________________________________ CentOS-virt mailing list CentOS-virt at centos.org<mailto:CentOS-virt at centos.org> https://lists.centos.org/mailman/listinfo/centos-virt _______________________________________________ CentOS-virt mailing list CentOS-virt at centos.org<mailto:CentOS-virt at centos.org> https://lists.centos.org/mailman/listinfo/centos-virt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20160815/d984f8f4/attachment-0006.html>