This is a problem I've had on and off under CentOS5 and CentOS6, with both xen and kvm. Currently, it happens consistently with kvm on 6.5, e.g. with every kernel update. I *think* it generally worked fine with the 6.4 kernels.
There are 7 VMs running on a 6.5, x86_64, 8GB RAM host, each with 512MB RAM and using the e1000 NIC. I picked this specific NIC because the default does not allow reliable monitoring through SNMP (IIRC). The host has two bonded NICs with br0 running on top.
When the host reboots, the VMs will generally hang bringing up the virtual NIC, and I need to go through several iterations of destroy/create, for each VM, to get them running. The always hang here (copy&paste from console):
... Welcome to CentOS Starting udev: udev: starting version 147 piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI e1000: Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 e1000 0000:00:03.0: eth0: (PCI:33MHz:32-bit) 00:16:3e:52:e3:0b e1000 0000:00:03.0: eth0: Intel(R) PRO/1000 Network Connection
Any suggestions on where to start looking?
NetworkManager and system-config-network do not really handle pair bonding very well, so you've obviously set it up by hand. this is the point where, getting a paid license RHEL license for your KVM server gets you direct access to their support team.
In particular, post your bridge settings. I think they should be set to "failover", not to the other, more complex and load balanced settings, to avoid confusing your switches and possibly KVM clients.
On Wed, Mar 26, 2014 at 7:20 AM, Lars Hecking lhecking@users.sourceforge.net wrote:
This is a problem I've had on and off under CentOS5 and CentOS6, with both xen and kvm. Currently, it happens consistently with kvm on 6.5, e.g. with every kernel update. I *think* it generally worked fine with the 6.4 kernels.
There are 7 VMs running on a 6.5, x86_64, 8GB RAM host, each with 512MB RAM and using the e1000 NIC. I picked this specific NIC because the default does not allow reliable monitoring through SNMP (IIRC). The host has two bonded NICs with br0 running on top.
When the host reboots, the VMs will generally hang bringing up the virtual NIC, and I need to go through several iterations of destroy/create, for each VM, to get them running. The always hang here (copy&paste from console):
... Welcome to CentOS Starting udev: udev: starting version 147 piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI e1000: Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 e1000 0000:00:03.0: eth0: (PCI:33MHz:32-bit) 00:16:3e:52:e3:0b e1000 0000:00:03.0: eth0: Intel(R) PRO/1000 Network Connection
Any suggestions on where to start looking?
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
Nico Kadel-Garcia writes:
NetworkManager and system-config-network do not really handle pair bonding very well, so you've obviously set it up by hand. this is the point where, getting a paid license RHEL license for your KVM server gets you direct access to their support team.
My servers don't use NM. Cf. other discussions on the main centos list :)
In particular, post your bridge settings. I think they should be set to "failover", not to the other, more complex and load balanced settings, to avoid confusing your switches and possibly KVM clients.
This? Or is there more information available?
# brctl show bridge name bridge id STP enabled interfaces br0 8000.00215e4d349b no bond0 vnet0 vnet1 vnet2 vnet3 vnet4 vnet5 vnet6 virbr0 8000.525400825a69 yes virbr0-nic #
For KVM virtual machines, I always use virtio type NICs. That is for performance. I never had any problems under CentOS 6.4 and 6.5.
On 3/26/2014 11:24 PM, Lars Hecking wrote:
Nico Kadel-Garcia writes:
NetworkManager and system-config-network do not really handle pair bonding very well, so you've obviously set it up by hand. this is the point where, getting a paid license RHEL license for your KVM server gets you direct access to their support team.
My servers don't use NM. Cf. other discussions on the main centos list :)
In particular, post your bridge settings. I think they should be set to "failover", not to the other, more complex and load balanced settings, to avoid confusing your switches and possibly KVM clients.
This? Or is there more information available?
# brctl show bridge name bridge id STP enabled interfaces br0 8000.00215e4d349b no bond0 vnet0 vnet1 vnet2 vnet3 vnet4 vnet5 vnet6 virbr0 8000.525400825a69 yes virbr0-nic #
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
I published notes some time back about pair bonding for CentOS, applicable to Scientific Linux as well, t https://wikis.uit.tufts.edu/confluence/display/TUSKpub/Configure+Pair+Bondin...
Show us your /etc/sysconfig/network-scripts/ifcfg-br0, if you would. I particularly want to see your "BONDING_OPTS".
On Wed, Mar 26, 2014 at 6:24 PM, Lars Hecking lhecking@users.sourceforge.net wrote:
Nico Kadel-Garcia writes:
NetworkManager and system-config-network do not really handle pair bonding very well, so you've obviously set it up by hand. this is the point where, getting a paid license RHEL license for your KVM server gets you direct access to their support team.
My servers don't use NM. Cf. other discussions on the main centos list :)
In particular, post your bridge settings. I think they should be set to "failover", not to the other, more complex and load balanced settings, to avoid confusing your switches and possibly KVM clients.
This? Or is there more information available?
# brctl show bridge name bridge id STP enabled interfaces br0 8000.00215e4d349b no bond0 vnet0 vnet1 vnet2 vnet3 vnet4 vnet5 vnet6 virbr0 8000.525400825a69 yes virbr0-nic #
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
Nico Kadel-Garcia writes:
I published notes some time back about pair bonding for CentOS, applicable to Scientific Linux as well, t https://wikis.uit.tufts.edu/confluence/display/TUSKpub/Configure+Pair+Bondin...
Show us your /etc/sysconfig/network-scripts/ifcfg-br0, if you would. I particularly want to see your "BONDING_OPTS".
That's in ifcfg-bond0. I use mode=0 (balance-rr), wheras your docs show mode=1 (active-backup).
Will change that and see what the next reboot brings.
On Thu, Mar 27, 2014 at 8:14 AM, Lars Hecking lhecking@users.sourceforge.net wrote:
Nico Kadel-Garcia writes:
I published notes some time back about pair bonding for CentOS, applicable to Scientific Linux as well, t https://wikis.uit.tufts.edu/confluence/display/TUSKpub/Configure+Pair+Bondin...
Show us your /etc/sysconfig/network-scripts/ifcfg-br0, if you would. I particularly want to see your "BONDING_OPTS".
That's in ifcfg-bond0. I use mode=0 (balance-rr), wheras your docs show mode=1 (active-backup).
Balance-rr is *NOT* your friend. Many upstream switches will have serious problems with it.
Will change that and see what the next reboot brings.
Please do.
Lars Hecking writes:
Nico Kadel-Garcia writes:
I published notes some time back about pair bonding for CentOS, applicable to Scientific Linux as well, t https://wikis.uit.tufts.edu/confluence/display/TUSKpub/Configure+Pair+Bondin...
Show us your /etc/sysconfig/network-scripts/ifcfg-br0, if you would. I particularly want to see your "BONDING_OPTS".
That's in ifcfg-bond0. I use mode=0 (balance-rr), wheras your docs show mode=1 (active-backup).
Will change that and see what the next reboot brings.
Did that, and out of seven, four vms came up straight away. Three did not, and some took more than one destroy/create cycle to come up with working NIC.
On Wed, Mar 26, 2014 at 11:20 AM, Lars Hecking lhecking@users.sourceforge.net wrote:
This is a problem I've had on and off under CentOS5 and CentOS6, with both xen and kvm. Currently, it happens consistently with kvm on 6.5, e.g. with every kernel update. I *think* it generally worked fine with the 6.4 kernels.
There are 7 VMs running on a 6.5, x86_64, 8GB RAM host, each with 512MB RAM and using the e1000 NIC. I picked this specific NIC because the default does not allow reliable monitoring through SNMP (IIRC). The host has two bonded NICs with br0 running on top.
When the host reboots, the VMs will generally hang bringing up the virtual NIC, and I need to go through several iterations of destroy/create, for each VM, to get them running. The always hang here (copy&paste from console):
... Welcome to CentOS Starting udev: udev: starting version 147 piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI e1000: Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 e1000 0000:00:03.0: eth0: (PCI:33MHz:32-bit) 00:16:3e:52:e3:0b e1000 0000:00:03.0: eth0: Intel(R) PRO/1000 Network Connection
Any suggestions on where to start looking?
Have you tried other virtual network cards, and/or PV network (netback for Xen or virtio for KVM)?
That would help you isolate whether the problem was in the e1000 emulation (which I suspect is shared between KVM and Xen) or in the host network configuration (which, it sounds like, is non-trivial).
-George
Have you tried other virtual network cards, and/or PV network (netback for Xen or virtio for KVM)?
Under CentOS5/Xen, I was always using the default NIC, and I disctinctly remember I wasn't the only one in my team experiencing this problem. For CentOS6/kvm, I switched from the default to the e1000 at some point, but can't remember when. It was definitely working before and after I switched; so yes, this might have to do with the e1000.
That would help you isolate whether the problem was in the e1000 emulation (which I suspect is shared between KVM and Xen) or in the host network configuration (which, it sounds like, is non-trivial).
As this is a production system, I'd have to set up a clone and experiment with different options. I have an identical host machine, so no problem in that regard, but this will take some time I don't currently have.