[CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

Wed Mar 29 05:25:57 UTC 2017
Adi Pircalabu <adi at ddns.com.au>

On 28-03-2017 8:12, Kevin Stange wrote:
> On 03/27/2017 04:03 PM, Kevin Stange wrote:
>> On 03/25/2017 02:35 PM, Sarah Newman wrote:
>>> On 03/16/2017 04:22 PM, Kevin Stange wrote:
>>> 
>>>>> I still can't rest assured the NIC issue is fixed, but no 4.4 or 
>>>>> 4.9
>>>>> server has yet had a NIC issue, with some being up almost a full 
>>>>> month.
>>>>> It looks promising! (I'm knocking on all the wood everywhere, 
>>>>> though.)
>>>> 
>>>> I'm ready to call this conclusive.  The problems I was having across 
>>>> the
>>>> board seemed to be caused by something seriously broken in 3.18.  
>>>> Most
>>>> of my servers are now on 4.9.13 or newer and everything has been 
>>>> working
>>>> very well.
>>>> 
>>>> I'm not going to post any further updates unless something breaks.
>>>> Thanks to everyone that provided tips and suggestions along the way.
>>>> 
>>> 
>>> Do you mind sharing what hardware have you been running the 4.9 
>>> kernel on other than "Supermicro X9DRT, Dual Xeon E5-2650, 2x I350, 
>>> 2x 82571EB" and
>>> "Supermicro X9DRD-iF/LF, Dual Xeon E5-2630, 2x I350, 2x 82575EB" if 
>>> any? Are you using any SATA/SAS controllers?
>> 
>> We have no expansion cards installed except for the dual-port gigabit
>> NICs.  We're using the onboard SATA controller for only the local Dom0
>> OS, and iSCSI and NFS for managing storage for VMs and images.
>> 
> 
> We've got some other motherboards as well, I think this list is 
> exhaustive:
> 
> Supermicro X8DT3
> Supermicro X8DT6
> Supermicro X9DRD-iF/LF
> Supermicro X9DRT
> Supermicro X9SCL/X9SCM
> 
> These are -F variants which means they include a BMC chip with a
> separate NIC.  A few of the X8DT3 are the LN4 variant, which has 4
> onboard NICs and therefore we did not use an expansion NIC.

FYI,
Here's one of our machines, which just crapped itself earlier today 
without being subjected to any significant load. Before the crash it was 
running kernel-3.18.44-20.el6.x86_64, now it's on 
kernel-4.9.13-22.el6.x86_64:
- Dell PowerEdge R620
- 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 6 cores each
- Dual Intel 10-Gigabit X540-AT2 (rev 01)
Before the crash both em interfaces member of bond1, which connects to 
the storage network, had tx & rx checksumming turned off.
xen_commandline: dom0_mem=1536M,max:2048M dom0_max_vcpus=1 
dom0_vcpus_pin cpuinfo com1=115200,8n1 console=com1,tty loglvl=all 
guest_loglvl=all

---
Adi Pircalabu