On 07/08/2010 05:08 PM, Christopher Chan wrote:
Hmmm ... which bond mode are you using?
Why mode 4 of course.
Ouch. Never used that mode.
<snip> mode=4 (802.3ad) IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification.
Pre-requisites: 1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave. 2. A switch that supports IEEE 802.3ad Dynamic link aggregation. Most switches will require some type of configuration to enable 802.3ad mode. </snip>
So I gather the bonding on the CentOS box is cooperating with the switches in some non-trivial fashion.
Too bad there are no defaults that use the subnet assigned to the school or the 192.168.0.0/16 (no, not my idea - inherited)
That is a big network. Might make sense in a school though. How many nodes on it? Any chance a <ahem> staff member plugged an unauthorised piece of hardware in somewhere.
If it was working, then suddenly stops, then something must have changed. I gather you have some configuration and change management system in place? Backups of conf files?
Hahaha, that was the best part. It just stopped. And stayed that way too after a reboot, reboot of switches and only started working again when I ran tcpdump for some reason.
tcpdump is probably putting your interface into promiscuous mode which is triggering something. Perhaps ARP packets.
I think something (perhaps obscure) has changed, you may just not be aware of it. Comparing your event timeline against your configuration change management systems may help.
But another colleague did find this in the iLo report:
You're the only admin but you have a colleague with access to an iLo report? That puts a big question mark over a previous assertion :-)
Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters Redundancy Reduced (Slot 10, Port 3)
Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters Redundancy Reduced (Slot 10, Port 4)
Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters Redundancy Reduced (Slot 10, Port 1)
Repaired Network 07/06/2010 12:01 07/06/2010 12:00 1 Network Adapter Link Down (Slot 10, Port 2)
Time to ask the HP chap what this is all about.
Looks like the bonding failover process is doing what it should.
A bit more info on you setup might help.
1. What is the purpose of the box with the fat network? 2. are all 4 interfaces being used? 3. are they plugged into the same switch? 4. you've got at least 2 networks, plus 2 vlans, plus a public internet connection to this box?
K