Network bond - one port goes down from time to time - Discuss

28 Mar 2016


      Hi,
may be someone has an idea:
We have three supermicron servers with two 10Gb Ports each, connected to a cisco switch stack 1Gb ports. All are on auto speed.
I configured a LACP bond on both sides on all servers, first with citrix xen server.
On one server eth0 goes down from time to time … maybe within minutes, someday it is up for some hours.
Two server are fine; the bond is up for 24 days(!) now without any problem.
Recently I installed centos 7.2 on that server in question and - bam - eth0 is going down from time to time …
I checked patch cables, tried an other switch port channel, reconfigured the ports, reinstalled the os. Same behavior.
And: We got a replacement server. Same behavior …. :)
Currently the cisco tech guys don’t see a problem on the switch (which is up for 3 Years now with 10+ servers connected … no problem so far), from the citrix side I don’t get much more hints.
In the logs i just have a Nic Link is Down … Nic Link is Up. It is always eth0.
Question:
Any idea ? One suggestion was Disable all power saving features in the server bios. Did not do that yet.
Is there any chance to set some sort of higher debug level for that nic/kernel/whatever to get some server os side feedback why the port goes down?
Regards and thanks for any hint! . Götz