[CentOS] Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)

Fri Feb 19 13:07:35 UTC 2016

> Date: Friday, February 19, 2016 12:47:54 +0000
> From: Ian B <ibrierley at gmail.com>
> 
> On Fri, Feb 19, 2016 at 12:33 PM, Richard wrote:
> 
>> > Date: Friday, February 19, 2016 11:08:48 +0000
>> > From: Ian B <ibrierley at gmail.com>
>> > 
>> > On Fri, Feb 19, 2016 at 10:56 AM, Ian B <ibrierley at gmail.com>
>> > wrote:
>> > 
>> >> Hi all,
>> >> 
>> >> We have a development server we have just tried updating the
>> >> kernel & glibc after recent recommendations. Its been stable
>> >> previously for a few years with only scheduled reboots.
>> >> 
>> >> Its running
>> >> Centos 6.6(final)
>> >> 2.6.32-573.18.1.el6.x86_64
>> >> GNU libc 2.12
>> >> 
>> >> Upgraded via YUM, rebooted, all fine for several hours, and
>> >> then network seemed to hang. Not much happening as its a dev
>> >> server we are testing before moving to production.
>> >> 
>> >> Googling, I see there is some history of e100e driver having
>> >> issues, and I'm wondering if it could be related.
>> >> 
>> >> Does anyone have any thoughts on where to do with it, as I'm
>> >> assuming it will hang again later.
>> >> 
>> >> Thanks, Ian
>> >> 
>> >> Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261
>> >> dev_watchdog+0x26d/0x280() (Not tainted)
>> >> Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM
>> >> Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e):
>> >> transmit queue 0 timed out
>> >> Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT
>> >> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
>> >> ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw
>> >> i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3
>> >> jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror
>> >> dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
>> >> Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted
>> >> 2.6.32-220.4.2.el6.x86_64 #1
>> >> Feb 18 05:04:36 kernel: Call Trace:
>> >> Feb 18 05:04:36 kernel: <IRQ>  [<ffffffff81069a17>] ?
>> >> warn_slowpath_common+0x87/0xc0
>> >> Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ?
>> >> warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel:
>> >> [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36
>> >> kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18
>> >> 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280
>> >> Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ?
>> >> run_timer_softirq+0x197/0x340
>> >> Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ?
>> >> tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel:
>> >> [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18
>> >> 05:04:36 kernel: [<ffffffff81072001>] ?
>> >> __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel:
>> >> [<ffffffff81095610>] ?
>> >> hrtimer_interrupt+0x140/0x250
>> >> Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ?
>> >> call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel:
>> >> [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36
>> >> kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18
>> >> 05:04:36 kernel: [<ffffffff814f4d70>] ?
>> >> smp_apic_timer_interrupt+0x70/0x9b
>> >> Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ?
>> >> apic_timer_interrupt+0x13/0x20
>> >> Feb 18 05:04:36 kernel: <EOI>  [<ffffffff812c49de>] ?
>> >> intel_idle+0xde/0x170 Feb 18 05:04:36 kernel:
>> >> [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36
>> >> kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb
>> >> 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
>> >> Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ?
>> >> rest_init+0x7a/0x80 Feb 18 05:04:36 kernel:
>> >> [<ffffffff81c1ff76>] ?
>> >> start_kernel+0x424/0x430 Feb 18 05:04:36 kernel:
>> >> [<ffffffff81c1f33a>] ?
>> >> x86_64_start_reservations+0x125/0x129
>> >> Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ?
>> >> x86_64_start_kernel+0xfa/0x109
>> >> Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
>> >> 
>> >> modinfo e1000e | grep version
>> >> version:        3.2.5-k
>> >> srcversion:     8CCA78B3C15DE6229299348
>> >> vermagic:       2.6.32-573.18.1.el6.x86_64 SMP mod_unload
>> >> modversions
>> >> 
>> >> 
>> >> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor
>> >> Family DRAM Controller (rev 09)
>> >> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series
>> >> Chipset Family USB Enhanced Host Controller #2 (rev 05)
>> >> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series
>> >> Chipset Family PCI Express Root Port 1 (rev b5)
>> >> 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series
>> >> Chipset Family PCI Express Root Port 5 (rev b5)
>> >> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series
>> >> Chipset Family USB Enhanced Host Controller #1 (rev 05)
>> >> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
>> >> 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC
>> >> Controller (rev 05)
>> >> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series
>> >> Chipset Family SATA AHCI Controller (rev 05)
>> >> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset
>> >> Family SMBus Controller (rev 05)
>> >> 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
>> >> Network Connection
>> >> 03:03.0 VGA compatible controller: Matrox Electronics Systems
>> >> Ltd. MGA G200eW WPCM450 (rev 0a)
>> >> 
>> 
>> > Just noticed that in the trace, it shows an old kernel, so I
>> > don't think grub was automatically selecting the latest kernel.
>> > Just wondering what process updates the default to be the latest
>> > kernel, and if a problem could be an update but grub selecting
>> > an older kernel, but other packages updated ?
>> > 
>> 
>> If your machine is "running Centos 6.6(final)", but you've
>> installed the new kernel and glibc that implies that you are
>> selectively applying updates. The 6.7 point release came out last
>> fall. In addition to the security implications of not fully
>> updating the system you may have missed packages that are
>> impacting networking.
>> 
>> You may want to do a full updating of the system and then see how
>> it acts -- it's hard to debug a system that may have mis-matched
>> pieces.
>> 
>> To see which kernel your grub is set to load by default, look at
>> the grub.conf -- the "default=" line (normally "0") indicates
>> which of the listed kernels will be selected.
>> 
>> If the "default" value isn't "0", and/or the newest kernel isn't
>> the first entry, then you have something mucking with things.
>> Check your /etc/sysconfig/kernel file for starters.
>> 
>> 
> Thanks Richard,
> 
> We currently do all security updates at short notice (as opposed to
> everything), via a script. I've amended the grub config and
> rebooted to make sure it will reboot into the correct kernel now,
> and yes /etc/sysconfig/kernel was different to production servers.
> We may try all packages if it continues to be unstable now and
> maybe whatever as its on a dev server to test.
> 
> Thanks again,
> 
> Ian

As Johnny Hughes pointed out last fall:

<https://lists.centos.org/pipermail/centos/2015-December/156697.html>

selective updating like that is not supported by CentOS or RHEL.

[please don't top post.]