On 01/26/2017 09:35 AM, Johnny Hughes wrote: > On 01/26/2017 09:32 AM, Johnny Hughes wrote: >> On 01/25/2017 11:49 AM, Kevin Stange wrote: >>> On 01/24/2017 11:16 AM, Kevin Stange wrote: >>>> On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote: >>>>> On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote: >>>>>> Kevin Stange, >>>>>> It can be either kernel or update the NIC driver or firmware of the NIC >>>>>> card. Hope that helps! >>>>>> >>>>>> Xlord >>>>>> -----Original Message----- >>>>>> From: CentOS-virt [mailto:centos-virt-bounces at centos.org] On Behalf Of Kevin >>>>>> Stange >>>>>> Sent: Tuesday, January 24, 2017 1:04 AM >>>>>> To: centos-virt at centos.org >>>>>> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / >>>>>> Linux 3.18 >>>>>> >>>> <snip> >>>>>> >>>>>> Has anyone experienced similar issues with this configuration, and if so, >>>>>> does anyone have tips on how to resolve the issues? >>>>> >>>>> Honeslty I would email Intel and see if they can help. This looks like >>>>> the NIC decides something is wrong, throws off an PCIe error and >>>>> then resets itself. >>>> >>>> This happens for several different NICs. Is there a good contact at >>>> Intel for this kind of thing, or should I just try to reach them through >>>> their web site? >>>> >>>>> It could also be an error in the Linux stack which would "eat" an >>>>> interrupt when migrating interrupts (which was fixed >>>>> upstream, see below). Are you running irqbalance? Could you try >>>>> turning it off? >>>> >>>> irqbalance is enabled on these servers. I'll try disabling it. >>> >>> I had stopped irqbalance yesterday afternoon, but had a hypervisor's >>> NICs fail anyway in early morning this morning, so I'm pretty sure this >>> is not the right tree to bark up. >>> >> >> Here is a set of drivers/fireware from Intel for those NICs: >> >> https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux- >> >> I will see if I can get a CentOS-6 build of the latest version of that >> from our older SRPM: >> >> http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm >> >> I am currently very busy with several c5, c6, c7 updates and the i686 >> altarch c7 tree .. but I have this on my list. In the meantime, maybe >> someone else could also see if those drivers help you (or you could try >> to compile / install it). >> >> Do you have another machine that you can use to see if you can duplicate >> the issue NOT running the xen.gz hypervisor boot, but just the straight >> kernel? I can't actually reproduce this problem reliably. It happens randomly when the servers are up and running anywhere between a few hours and a month or more, and I haven't been able to isolate any specific way to cause it to happen. As a result I can't really test different solutions on different servers to see what helps. I was hoping other people were seeing it so that I could get some direction. If I can reproduce it, it won't take me very long to identify what the cause is. Right now if I do upgrade the drivers on the systems I won't really know if it's fixed until I don't see another issue for several months. > Actually .. I think this is the driver for you: > > https://downloadcenter.intel.com/download/13663 > > And this explains how to make it work: > > http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005767.html The different combinations of NICs overlap both the e1000e and igb drivers, but the most egregious issues have been with the igb ones. I'll try to give this a shot and report back if I still see issues with a server after doing so, but it might be a week or two before I find out. -- Kevin Stange Chief Technology Officer Steadfast | Managed Infrastructure, Datacenter and Cloud Services 800 S Wells, Suite 190 | Chicago, IL 60607 312.602.2689 X203 | Fax: 312.602.2688 kevin at steadfast.net | www.steadfast.net