Setting up bonding in active-backup mode 1 (using ARP monitoring) on a server, it looked OK, but pulling the active link cable didn't actually work, it didn't fail over.
Eventually with manual playing around with modprobe, ifconfig, ifenslave, etc., a solution was stumbled upon: enslave the eth1 device before eth0, and all is good.
Why this should matter is a puzzle - I could not find anything in bonding.txt or on the web about it.
I had to change ifup-eth to fix the problem.
Any ideas on why the enslavement order matters, or a better solution to work around it?
The rest of this post is details.
To fix this, I had to patch /etc/sysconfig/network-scripts/ifup-eth to reverse the order when it is updating the sysfs slaves list. A 1-line change, from:
for device in $(LANG=C egrep -l "^[[:space:]]*MASTER="?${DEVICE}"?" /etc/sysconfig/network-scripts/ifcfg-*) ; do
to:
for device in $(LANG=C egrep -l "^[[:space:]]*MASTER="?${DEVICE}"?" /etc/sysconfig/network-scripts/ifcfg-* | sort -r) ; do
With that change, everything works perfectly.
I don't like this solution (changing standard system files) but it seems like the best one for now, and should not break anything.
Maybe it's the particular network devices. Platform is VIA M850 running CentOS5.7 64-bit, original content from DVD (no yum update done).
Eth0 is the onboard device, using an updated VIA Velocity driver (velocityget 1.42 instead of default via-velocity):
05:00.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 82)
Eth1 is a Linksys (Cisco) USB300M USB-Ethernet dongle, using asix driver:
Bus 001 Device 005: ID 0b95:7720 ASIX Electronics Corp. AX88772
Modprobe.conf:
alias eth0 velocityget
With the network service enabled (NetworkManager disabled), this is my setup:
ifcfg-bond0:
DEVICE=bond0 BOOTPROTO=none ONBOOT=yes IPADDR=10.6.0.90 NETMASK=255.255.255.0 GATEWAY=10.6.0.1 BONDING_OPTS="mode=active-backup arp_interval=300 primary=eth0 arp_ip_target=+10.6.0.1 arp_ip_target=+10.6.0.2"
ifcfg-eth0:
DEVICE=eth0 BOOTPROTO=none ONBOOT=yes IPADDR=10.6.0.90 NETMASK=255.255.255.0 GATEWAY=10.6.0.1 HWADDR=00:1F:F2:03:FA:45 MASTER=bond0 SLAVE=yes
ifcfg-eth1:
DEVICE=eth1 BOOTPROTO=none ONBOOT=yes IPADDR=10.6.1.90 NETMASK=255.255.255.0 GATEWAY=10.6.0.1 MASTER=bond0 SLAVE=yes HWADDR=58:6d:8f:3d:8d:4f
whitivery co55-sy1t@dea.spamcon.org wrote:
Setting up bonding in active-backup mode 1 (using ARP monitoring) on a server, it looked OK, but pulling the active link cable didn't actually work, it didn't fail over.
Eventually with manual playing around with modprobe, ifconfig, ifenslave, etc., a solution was stumbled upon: enslave the eth1 device before eth0, and all is good.
Why this should matter is a puzzle - I could not find anything in bonding.txt or on the web about it.
I had to change ifup-eth to fix the problem.
Any ideas on why the enslavement order matters, or a better solution to work around it?
The rest of this post is details.
To fix this, I had to patch /etc/sysconfig/network-scripts/ifup-eth to reverse the order when it is updating the sysfs slaves list. A 1-line change, from:
for device in $(LANG=C egrep -l "^[[:space:]]*MASTER="?${DEVICE}"?" /etc/sysconfig/network-scripts/ifcfg-*) ; do
to:
for device in $(LANG=C egrep -l "^[[:space:]]*MASTER="?${DEVICE}"?" /etc/sysconfig/network-scripts/ifcfg-* | sort -r) ; do
With that change, everything works perfectly.
I don't like this solution (changing standard system files) but it seems like the best one for now, and should not break anything.
Maybe it's the particular network devices. Platform is VIA M850 running CentOS5.7 64-bit, original content from DVD (no yum update done).
Eth0 is the onboard device, using an updated VIA Velocity driver (velocityget 1.42 instead of default via-velocity):
05:00.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 82)
Eth1 is a Linksys (Cisco) USB300M USB-Ethernet dongle, using asix driver:
Bus 001 Device 005: ID 0b95:7720 ASIX Electronics Corp. AX88772
Modprobe.conf:
alias eth0 velocityget
With the network service enabled (NetworkManager disabled), this is my setup:
ifcfg-bond0:
DEVICE=bond0 BOOTPROTO=none ONBOOT=yes IPADDR=10.6.0.90 NETMASK=255.255.255.0 GATEWAY=10.6.0.1 BONDING_OPTS="mode=active-backup arp_interval=300 primary=eth0 arp_ip_target=+10.6.0.1 arp_ip_target=+10.6.0.2"
ifcfg-eth0:
DEVICE=eth0 BOOTPROTO=none ONBOOT=yes IPADDR=10.6.0.90 NETMASK=255.255.255.0 GATEWAY=10.6.0.1 HWADDR=00:1F:F2:03:FA:45 MASTER=bond0 SLAVE=yes
ifcfg-eth1:
DEVICE=eth1 BOOTPROTO=none ONBOOT=yes IPADDR=10.6.1.90 NETMASK=255.255.255.0 GATEWAY=10.6.0.1 MASTER=bond0 SLAVE=yes HWADDR=58:6d:8f:3d:8d:4f
Is there a better group to post this to?
On 10/13/11, whitivery co55-sy1t@dea.spamcon.org wrote:
Eth0 is the onboard device, using an updated VIA Velocity driver (velocityget 1.42 instead of default via-velocity):
05:00.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 82)
Eth1 is a Linksys (Cisco) USB300M USB-Ethernet dongle, using asix driver:
Have you tried adding another Ethernet adapter? This is because I was reading the bonding doc and towards the end there was this part
As discussed in the options section, above, some drivers do not support the netif_carrier_on/_off link state tracking system. With use_carrier enabled, bonding will always see these links as up, regardless of their actual state.
So it might a driver issue, i.e. the VIA driver is not reporting the link down correctly.
Emmanuel Noobadmin centos.admin@gmail.com wrote:
On 10/13/11, whitivery co55-sy1t@dea.spamcon.org wrote:
Eth0 is the onboard device, using an updated VIA Velocity driver (velocityget 1.42 instead of default via-velocity):
05:00.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 82)
Eth1 is a Linksys (Cisco) USB300M USB-Ethernet dongle, using asix driver:
Have you tried adding another Ethernet adapter? This is because I was reading the bonding doc and towards the end there was this part
As discussed in the options section, above, some drivers do not support the netif_carrier_on/_off link state tracking system. With use_carrier enabled, bonding will always see these links as up, regardless of their actual state.
So it might a driver issue, i.e. the VIA driver is not reporting the link down correctly.
Thank you for the reply, but I don't think that this is the issue. Otherwise bonding failover wouldn't work at all. When enslaved in order eth1 eth0, bonding and link detection work properly - with eth0 set as primary, I pull the eth0 cable, it switches to eth1; plug eth0 back in, it switches back to it; pull the eth1 cable, it knows there's no fallback. So the link detection seems fine.
On 10/19/11, whitivery co55-sy1t@dea.spamcon.org wrote:
Thank you for the reply, but I don't think that this is the issue. Otherwise bonding failover wouldn't work at all. When enslaved in order eth1 eth0, bonding and link detection work properly - with eth0 set as primary, I pull the eth0 cable, it switches to eth1; plug eth0 back in, it switches back to it; pull the eth1 cable, it knows there's no fallback. So the link detection seems fine.
Ok, that does eliminate eth0 link detection as the source of the problem. I think you might have to ask on another mailing list. It seems like it should be the kernel list but not 100% certain.