[CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

Thu Jan 26 15:32:04 UTC 2017
Johnny Hughes <johnny at centos.org>

On 01/25/2017 11:49 AM, Kevin Stange wrote:
> On 01/24/2017 11:16 AM, Kevin Stange wrote:
>> On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
>>>> Kevin Stange,
>>>> It can be either kernel or update the NIC driver or firmware of the NIC
>>>> card. Hope that helps!
>>>>
>>>> Xlord
>>>> -----Original Message-----
>>>> From: CentOS-virt [mailto:centos-virt-bounces at centos.org] On Behalf Of Kevin
>>>> Stange
>>>> Sent: Tuesday, January 24, 2017 1:04 AM
>>>> To: centos-virt at centos.org
>>>> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 /
>>>> Linux 3.18
>>>>
>> <snip>
>>>>
>>>> Has anyone experienced similar issues with this configuration, and if so,
>>>> does anyone have tips on how to resolve the issues?
>>>
>>> Honeslty I would email Intel and see if they can help. This looks like
>>> the NIC decides something is wrong, throws off an PCIe error and
>>> then resets itself.
>>
>> This happens for several different NICs.  Is there a good contact at
>> Intel for this kind of thing, or should I just try to reach them through
>> their web site?
>>
>>> It could also be an error in the Linux stack which would "eat" an
>>> interrupt when migrating interrupts (which was fixed
>>> upstream, see below). Are you running irqbalance? Could you try
>>> turning it off?
>>
>> irqbalance is enabled on these servers.  I'll try disabling it.
> 
> I had stopped irqbalance yesterday afternoon, but had a hypervisor's
> NICs fail anyway in early morning this morning, so I'm pretty sure this
> is not the right tree to bark up.
> 

Here is a set of drivers/fireware from Intel for those NICs:

https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-

I will see if I can get a CentOS-6 build of the latest version of that
from our older SRPM:

http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm

I am currently very busy with several c5, c6, c7 updates and the i686
altarch c7 tree .. but I have this on my list.  In the meantime, maybe
someone else could also see if those drivers help you (or you could try
to compile / install it).

Do you have another machine that you can use to see if you can duplicate
the issue NOT running the xen.gz hypervisor boot, but just the straight
kernel?

Thanks,
Johnny Hughes

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170126/f4551c14/attachment-0006.sig>