[CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

Wed Jan 25 17:49:23 UTC 2017
Kevin Stange <kevin at steadfast.net>

On 01/24/2017 11:16 AM, Kevin Stange wrote:
> On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
>>> Kevin Stange,
>>> It can be either kernel or update the NIC driver or firmware of the NIC
>>> card. Hope that helps!
>>>
>>> Xlord
>>> -----Original Message-----
>>> From: CentOS-virt [mailto:centos-virt-bounces at centos.org] On Behalf Of Kevin
>>> Stange
>>> Sent: Tuesday, January 24, 2017 1:04 AM
>>> To: centos-virt at centos.org
>>> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 /
>>> Linux 3.18
>>>
> <snip>
>>>
>>> Has anyone experienced similar issues with this configuration, and if so,
>>> does anyone have tips on how to resolve the issues?
>>
>> Honeslty I would email Intel and see if they can help. This looks like
>> the NIC decides something is wrong, throws off an PCIe error and
>> then resets itself.
> 
> This happens for several different NICs.  Is there a good contact at
> Intel for this kind of thing, or should I just try to reach them through
> their web site?
> 
>> It could also be an error in the Linux stack which would "eat" an
>> interrupt when migrating interrupts (which was fixed
>> upstream, see below). Are you running irqbalance? Could you try
>> turning it off?
> 
> irqbalance is enabled on these servers.  I'll try disabling it.

I had stopped irqbalance yesterday afternoon, but had a hypervisor's
NICs fail anyway in early morning this morning, so I'm pretty sure this
is not the right tree to bark up.

-- 
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
kevin at steadfast.net | www.steadfast.net