[CentOS] Network hangs after several hours (Centos 6 recently upgraded kernel/glibc)

Thanks Richard,

We currently do all security updates at short notice (as opposed to
everything), via a script. I've amended the grub config and rebooted to
make sure it will reboot into the correct kernel now, and yes
/etc/sysconfig/kernel was different to production servers. We may try all
packages if it continues to be unstable now and maybe whatever as its on a
dev server to test.

Thanks again,

Ian

On Fri, Feb 19, 2016 at 12:33 PM, Richard <
lists-centos at listmail.innovate.net> wrote:

>
>
> > Date: Friday, February 19, 2016 11:08:48 +0000
> > From: Ian B <ibrierley at gmail.com>
> >
> > On Fri, Feb 19, 2016 at 10:56 AM, Ian B <ibrierley at gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> We have a development server we have just tried updating the
> >> kernel & glibc after recent recommendations. Its been stable
> >> previously for a few years with only scheduled reboots.
> >>
> >> Its running
> >> Centos 6.6(final)
> >> 2.6.32-573.18.1.el6.x86_64
> >> GNU libc 2.12
> >>
> >> Upgraded via YUM, rebooted, all fine for several hours, and then
> >> network seemed to hang. Not much happening as its a dev server we
> >> are testing before moving to production.
> >>
> >> Googling, I see there is some history of e100e driver having
> >> issues, and I'm wondering if it could be related.
> >>
> >> Does anyone have any thoughts on where to do with it, as I'm
> >> assuming it will hang again later.
> >>
> >> Thanks, Ian
> >>
> >> Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261
> >> dev_watchdog+0x26d/0x280() (Not tainted)
> >> Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM
> >> Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit
> >> queue 0 timed out
> >> Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT
> >> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
> >> ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw
> >> i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd
> >> mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash
> >> dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36
> >> kernel: Pid: 0, comm: swapper Not tainted
> >> 2.6.32-220.4.2.el6.x86_64 #1
> >> Feb 18 05:04:36 kernel: Call Trace:
> >> Feb 18 05:04:36 kernel: <IRQ>  [<ffffffff81069a17>] ?
> >> warn_slowpath_common+0x87/0xc0
> >> Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ?
> >> warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel:
> >> [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36
> >> kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18
> >> 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280
> >> Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ?
> >> run_timer_softirq+0x197/0x340
> >> Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ?
> >> tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel:
> >> [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36
> >> kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18
> >> 05:04:36 kernel: [<ffffffff81095610>] ?
> >> hrtimer_interrupt+0x140/0x250
> >> Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ?
> >> call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel:
> >> [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36
> >> kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36
> >> kernel: [<ffffffff814f4d70>] ?
> >> smp_apic_timer_interrupt+0x70/0x9b
> >> Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ?
> >> apic_timer_interrupt+0x13/0x20
> >> Feb 18 05:04:36 kernel: <EOI>  [<ffffffff812c49de>] ?
> >> intel_idle+0xde/0x170 Feb 18 05:04:36 kernel:
> >> [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36
> >> kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb
> >> 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
> >> Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80
> >> Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ?
> >> start_kernel+0x424/0x430 Feb 18 05:04:36 kernel:
> >> [<ffffffff81c1f33a>] ?
> >> x86_64_start_reservations+0x125/0x129
> >> Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ?
> >> x86_64_start_kernel+0xfa/0x109
> >> Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
> >>
> >> modinfo e1000e | grep version
> >> version:        3.2.5-k
> >> srcversion:     8CCA78B3C15DE6229299348
> >> vermagic:       2.6.32-573.18.1.el6.x86_64 SMP mod_unload
> >> modversions
> >>
> >>
> >> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor
> >> Family DRAM Controller (rev 09)
> >> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series
> >> Chipset Family USB Enhanced Host Controller #2 (rev 05)
> >> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series
> >> Chipset Family PCI Express Root Port 1 (rev b5)
> >> 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series
> >> Chipset Family PCI Express Root Port 5 (rev b5)
> >> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series
> >> Chipset Family USB Enhanced Host Controller #1 (rev 05)
> >> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
> >> 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC
> >> Controller (rev 05)
> >> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series
> >> Chipset Family SATA AHCI Controller (rev 05)
> >> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset
> >> Family SMBus Controller (rev 05)
> >> 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
> >> Network Connection
> >> 03:03.0 VGA compatible controller: Matrox Electronics Systems
> >> Ltd. MGA G200eW WPCM450 (rev 0a)
> >>
>
> > Just noticed that in the trace, it shows an old kernel, so I don't
> > think grub was automatically selecting the latest kernel. Just
> > wondering what process updates the default to be the latest
> > kernel, and if a problem could be an update but grub selecting an
> > older kernel, but other packages updated ?
> >
>
> If your machine is "running Centos 6.6(final)", but you've installed
> the new kernel and glibc that implies that you are selectively
> applying updates. The 6.7 point release came out last fall. In
> addition to the security implications of not fully updating the
> system you may have missed packages that are impacting networking.
>
> You may want to do a full updating of the system and then see how it
> acts -- it's hard to debug a system that may have mis-matched pieces.
>
> To see which kernel your grub is set to load by default, look at the
> grub.conf -- the "default=" line (normally "0") indicates which of
> the listed kernels will be selected.
>
> If the "default" value isn't "0", and/or the newest kernel isn't the
> first entry, then you have something mucking with things. Check your
> /etc/sysconfig/kernel file for starters.
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>