On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote: >> Kevin Stange, >> It can be either kernel or update the NIC driver or firmware of the NIC >> card. Hope that helps! >> >> Xlord >> -----Original Message----- >> From: CentOS-virt [mailto:centos-virt-bounces at centos.org] On Behalf Of Kevin >> Stange >> Sent: Tuesday, January 24, 2017 1:04 AM >> To: centos-virt at centos.org >> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / >> Linux 3.18 >> <snip> >> >> Has anyone experienced similar issues with this configuration, and if so, >> does anyone have tips on how to resolve the issues? > > Honeslty I would email Intel and see if they can help. This looks like > the NIC decides something is wrong, throws off an PCIe error and > then resets itself. This happens for several different NICs. Is there a good contact at Intel for this kind of thing, or should I just try to reach them through their web site? > It could also be an error in the Linux stack which would "eat" an > interrupt when migrating interrupts (which was fixed > upstream, see below). Are you running irqbalance? Could you try > turning it off? irqbalance is enabled on these servers. I'll try disabling it. > Did you have these issues with an earlier kernel? The last kernel these boxes ran was 2.6.18-412.el5xen under CentOS 5 and they were very stable, however the differences between 2.6.18 and 3.18 are immense, especially with features like ASPM and other power management code. We've run into ASPM issues on systems before going from CentOS 5 to the CentOS 6 kernel 2.6.32, but not this particular hardware, which is why my first thought was to look at ASPM. They've all been upgraded to CentOS 6 and running the virt SIG kernel kernel-3.18.44-20.el6.x86_64. I haven't run any previous versions 3.18 or tried any other kernels. It surprises me that we would have all these issues if there isn't a more widespread problem considering the hardware is fairly maintain and covers a lot of NIC chips. -- Kevin Stange Chief Technology Officer Steadfast | Managed Infrastructure, Datacenter and Cloud Services 800 S Wells, Suite 190 | Chicago, IL 60607 312.602.2689 X203 | Fax: 312.602.2688 kevin at steadfast.net | www.steadfast.net