[CentOS] Networking just stopped working

Thu Jul 8 09:09:09 UTC 2010
Kahlil Hodgson <kahlil.hodgson at dealmax.com.au>

On 07/08/2010 05:08 PM, Christopher Chan wrote:
>> Hmmm ... which bond mode are you using?
> 
> Why mode 4 of course.

Ouch.  Never used that mode.

<snip>
mode=4 (802.3ad)
IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that
share the same speed and duplex settings. Utilizes all slaves in the
active aggregator according to the 802.3ad specification.

	Pre-requisites:
	1. Ethtool support in the base drivers for retrieving
	the speed and duplex of each slave.
	2. A switch that supports IEEE 802.3ad Dynamic link
	aggregation.
	Most switches will require some type of configuration
	to enable 802.3ad mode.
</snip>

So I gather the bonding on the CentOS box is cooperating with the
switches in some non-trivial fashion.

> Too bad there are no defaults that use the subnet assigned to the school 
> or the 192.168.0.0/16 (no, not my idea - inherited)

That is a big network.  Might make sense in a school though.  How many
nodes on it?  Any chance a <ahem> staff member plugged an unauthorised
piece of hardware in somewhere.

>> If it was working, then suddenly stops, then something must have
>> changed.  I gather you have some configuration and change management
>> system in place?  Backups of conf files?
> 
> Hahaha, that was the best part. It just stopped. And stayed that way too 
> after a reboot, reboot of switches and only started working again when I 
> ran tcpdump for some reason.

tcpdump is probably putting your interface into promiscuous mode which
is triggering something. Perhaps ARP packets.

I think something (perhaps obscure) has changed, you may just not be
aware of it.  Comparing your event timeline against your configuration
change management systems may help.

> But another colleague did find this in the iLo report:

You're the only admin but you have a colleague with access to an iLo
report?  That puts a big question mark over a previous assertion :-)

> Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters 
> Redundancy Reduced (Slot 10, Port 3)
> 
> Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters 
> Redundancy Reduced (Slot 10, Port 4)
> 
> Repaired Network 07/06/2010 12:35 07/06/2010 12:00 2 Network Adapters 
> Redundancy Reduced (Slot 10, Port 1)
> 
> Repaired Network 07/06/2010 12:01 07/06/2010 12:00 1 Network Adapter 
> Link Down (Slot 10, Port 2)
> 
> Time to ask the HP chap what this is all about.

Looks like the bonding failover process is doing what it should.

A bit more info on you setup might help.

1. What is the purpose of the box with the fat network?
2. are all 4 interfaces being used?
3. are they plugged into the same switch?
4. you've got at least 2 networks, plus 2 vlans, plus a public internet
connection to this box?

K