On 07/08/2010 05:08 PM, Christopher Chan wrote: >> Hmmm ... which bond mode are you using? > > Why mode 4 of course. Ouch. Never used that mode. <snip> mode=4 (802.3ad) IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification. Pre-requisites: 1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave. 2. A switch that supports IEEE 802.3ad Dynamic link aggregation. Most switches will require some type of configuration to enable 802.3ad mode. </snip> So I gather the bonding on the CentOS box is cooperating with the switches in some non-trivial fashion. > Too bad there are no defaults that use the subnet assigned to the school > or the 192.168.0.0/16 (no, not my idea - inherited) That is a big network. Might make sense in a school though. How many nodes on it? Any chance a <ahem> staff member plugged an unauthorised piece of hardware in somewhere. >> If it was working, then suddenly stops, then something must have >> changed. I gather you have some configuration and change management >> system in place? Backups of conf files? > > Hahaha, that was the best part. It just stopped. And stayed that way too > after a reboot, reboot of switches and only started working again when I > ran tcpdump for some reason. tcpdump is probably putting your interface into promiscuous mode which is triggering something. Perhaps ARP packets. I think something (perhaps obscure) has changed, you may just not be aware of it. Comparing your event timeline against your configuration change management systems may help. > But another colleague did find this in the iLo report: You're the only admin but you have a colleague with access to an iLo report? That puts a big question mark over a previous assertion :-) > Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters > Redundancy Reduced (Slot 10, Port 3) > > Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters > Redundancy Reduced (Slot 10, Port 4) > > Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters > Redundancy Reduced (Slot 10, Port 1) > > Repaired Network 07/06/2010 12:01 07/06/2010 12:00 1 Network Adapter > Link Down (Slot 10, Port 2) > > Time to ask the HP chap what this is all about. Looks like the bonding failover process is doing what it should. A bit more info on you setup might help. 1. What is the purpose of the box with the fat network? 2. are all 4 interfaces being used? 3. are they plugged into the same switch? 4. you've got at least 2 networks, plus 2 vlans, plus a public internet connection to this box? K