>Are there other kernel options that might be useful to try? pci=nomsi https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173/comments/13 On 27 January 2017 at 18:21, Kevin Stange <kevin at steadfast.net> wrote: > On 01/27/2017 06:08 AM, Karel Hendrych wrote: > > Have you tried to eliminate all power management features all over? > > I've been trying to find and disable all power management features but > having relatively little luck with that solving the problems. Stabbing > the the dark I've tried different ACPI settings, including completely > disabling it, disabling CPU frequency scaling, and setting pcie_aspm=off > on the kernel command line. Are there other kernel options that might > be useful to try? > > > Are the devices connected to the same network infrastructure? > > There are two onboard NICs and two NICs on a dual-port card in each > server. All devices connect to a cisco switch pair in VSS and the links > are paired in LACP. > > > There has to be something common. > > The NICs having issues are running a native VLAN, a tagged VLAN, iSCSI > and NFS traffic, as well as some basic management stuff over SSH, and > they are configured with an MTU of 9000 on the native VLAN. It's a lot > of features, but I can't really turn them off and then actually have > enough load on the NICs to reproduce the issue. Several of these > servers were installed and being burned in for 3 months without ever > having an issue, but suddenly collapsed when I tried to bring 20 or so > real-world VMs up on them. > > The other NICs in the system that are connected don't exhibit issues and > run only VM network interfaces. They are also in LACP and running VLAN > tags, but normal 1500 MTU. > > So far it seems to correlate with NICs on the expansion cards, but it's > a coincidence that these cards are the ones with the storage and > management traffic. I'm trying to swap some of this load to the onboard > NICs to see if the issues migrate over with it, or if they stay with the > expansion cards. > > If the issue exists on both NIC types, then it rules out the specific > NIC chipset as the culprit. It could point to the driver, but upgrading > it to a newer version did not help and actually appeared to make > everything worse. This issue might actually be more to do with the PCIe > bridge than the NICs, but these are still different motherboards with > different PCIe bridges (5520 vs C600) experiencing the same issues. > > > I've been using Intel NICs with Xen/CentOS for ages with no issues. > > I figured that must be so. Everyone uses Intel NICs. If this was a > common issue, it would probably be causing a lot of people a lot of > trouble. > > -- > Kevin Stange > Chief Technology Officer > Steadfast | Managed Infrastructure, Datacenter and Cloud Services > 800 S Wells, Suite 190 | Chicago, IL 60607 > 312.602.2689 X203 | Fax: 312.602.2688 > kevin at steadfast.net | www.steadfast.net > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > https://lists.centos.org/mailman/listinfo/centos-virt > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170130/21e9d5f4/attachment-0006.html>