Hi all,
We have a development server we have just tried updating the kernel & glibc after recent recommendations. Its been stable previously for a few years with only scheduled reboots.
Its running Centos 6.6(final) 2.6.32-573.18.1.el6.x86_64 GNU libc 2.12
Upgraded via YUM, rebooted, all fine for several hours, and then network seemed to hang. Not much happening as its a dev server we are testing before moving to production.
Googling, I see there is some history of e100e driver having issues, and I'm wondering if it could be related.
Does anyone have any thoughts on where to do with it, as I'm assuming it will hang again later.
Thanks, Ian
Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1 Feb 18 05:04:36 kernel: Call Trace: Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
modinfo e1000e | grep version version: 3.2.5-k srcversion: 8CCA78B3C15DE6229299348 vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
Just noticed that in the trace, it shows an old kernel, so I don't think grub was automatically selecting the latest kernel. Just wondering what process updates the default to be the latest kernel, and if a problem could be an update but grub selecting an older kernel, but other packages updated ?
On Fri, Feb 19, 2016 at 10:56 AM, Ian B ibrierley@gmail.com wrote:
Hi all,
We have a development server we have just tried updating the kernel & glibc after recent recommendations. Its been stable previously for a few years with only scheduled reboots.
Its running Centos 6.6(final) 2.6.32-573.18.1.el6.x86_64 GNU libc 2.12
Upgraded via YUM, rebooted, all fine for several hours, and then network seemed to hang. Not much happening as its a dev server we are testing before moving to production.
Googling, I see there is some history of e100e driver having issues, and I'm wondering if it could be related.
Does anyone have any thoughts on where to do with it, as I'm assuming it will hang again later.
Thanks, Ian
Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1 Feb 18 05:04:36 kernel: Call Trace: Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
modinfo e1000e | grep version version: 3.2.5-k srcversion: 8CCA78B3C15DE6229299348 vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
Date: Friday, February 19, 2016 11:08:48 +0000 From: Ian B ibrierley@gmail.com
On Fri, Feb 19, 2016 at 10:56 AM, Ian B ibrierley@gmail.com wrote:
Hi all,
We have a development server we have just tried updating the kernel & glibc after recent recommendations. Its been stable previously for a few years with only scheduled reboots.
Its running Centos 6.6(final) 2.6.32-573.18.1.el6.x86_64 GNU libc 2.12
Upgraded via YUM, rebooted, all fine for several hours, and then network seemed to hang. Not much happening as its a dev server we are testing before moving to production.
Googling, I see there is some history of e100e driver having issues, and I'm wondering if it could be related.
Does anyone have any thoughts on where to do with it, as I'm assuming it will hang again later.
Thanks, Ian
Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1 Feb 18 05:04:36 kernel: Call Trace: Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
modinfo e1000e | grep version version: 3.2.5-k srcversion: 8CCA78B3C15DE6229299348 vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
Just noticed that in the trace, it shows an old kernel, so I don't think grub was automatically selecting the latest kernel. Just wondering what process updates the default to be the latest kernel, and if a problem could be an update but grub selecting an older kernel, but other packages updated ?
If your machine is "running Centos 6.6(final)", but you've installed the new kernel and glibc that implies that you are selectively applying updates. The 6.7 point release came out last fall. In addition to the security implications of not fully updating the system you may have missed packages that are impacting networking.
You may want to do a full updating of the system and then see how it acts -- it's hard to debug a system that may have mis-matched pieces.
To see which kernel your grub is set to load by default, look at the grub.conf -- the "default=" line (normally "0") indicates which of the listed kernels will be selected.
If the "default" value isn't "0", and/or the newest kernel isn't the first entry, then you have something mucking with things. Check your /etc/sysconfig/kernel file for starters.
Thanks Richard,
We currently do all security updates at short notice (as opposed to everything), via a script. I've amended the grub config and rebooted to make sure it will reboot into the correct kernel now, and yes /etc/sysconfig/kernel was different to production servers. We may try all packages if it continues to be unstable now and maybe whatever as its on a dev server to test.
Thanks again,
Ian
On Fri, Feb 19, 2016 at 12:33 PM, Richard < lists-centos@listmail.innovate.net> wrote:
Date: Friday, February 19, 2016 11:08:48 +0000 From: Ian B ibrierley@gmail.com
On Fri, Feb 19, 2016 at 10:56 AM, Ian B ibrierley@gmail.com wrote:
Hi all,
We have a development server we have just tried updating the kernel & glibc after recent recommendations. Its been stable previously for a few years with only scheduled reboots.
Its running Centos 6.6(final) 2.6.32-573.18.1.el6.x86_64 GNU libc 2.12
Upgraded via YUM, rebooted, all fine for several hours, and then network seemed to hang. Not much happening as its a dev server we are testing before moving to production.
Googling, I see there is some history of e100e driver having issues, and I'm wondering if it could be related.
Does anyone have any thoughts on where to do with it, as I'm assuming it will hang again later.
Thanks, Ian
Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1 Feb 18 05:04:36 kernel: Call Trace: Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
modinfo e1000e | grep version version: 3.2.5-k srcversion: 8CCA78B3C15DE6229299348 vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
Just noticed that in the trace, it shows an old kernel, so I don't think grub was automatically selecting the latest kernel. Just wondering what process updates the default to be the latest kernel, and if a problem could be an update but grub selecting an older kernel, but other packages updated ?
If your machine is "running Centos 6.6(final)", but you've installed the new kernel and glibc that implies that you are selectively applying updates. The 6.7 point release came out last fall. In addition to the security implications of not fully updating the system you may have missed packages that are impacting networking.
You may want to do a full updating of the system and then see how it acts -- it's hard to debug a system that may have mis-matched pieces.
To see which kernel your grub is set to load by default, look at the grub.conf -- the "default=" line (normally "0") indicates which of the listed kernels will be selected.
If the "default" value isn't "0", and/or the newest kernel isn't the first entry, then you have something mucking with things. Check your /etc/sysconfig/kernel file for starters.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Date: Friday, February 19, 2016 12:47:54 +0000 From: Ian B ibrierley@gmail.com
On Fri, Feb 19, 2016 at 12:33 PM, Richard wrote:
Date: Friday, February 19, 2016 11:08:48 +0000 From: Ian B ibrierley@gmail.com
On Fri, Feb 19, 2016 at 10:56 AM, Ian B ibrierley@gmail.com wrote:
Hi all,
We have a development server we have just tried updating the kernel & glibc after recent recommendations. Its been stable previously for a few years with only scheduled reboots.
Its running Centos 6.6(final) 2.6.32-573.18.1.el6.x86_64 GNU libc 2.12
Upgraded via YUM, rebooted, all fine for several hours, and then network seemed to hang. Not much happening as its a dev server we are testing before moving to production.
Googling, I see there is some history of e100e driver having issues, and I'm wondering if it could be related.
Does anyone have any thoughts on where to do with it, as I'm assuming it will hang again later.
Thanks, Ian
Feb 18 05:04:36 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Feb 18 05:04:36 kernel: Hardware name: X9SCL/X9SCM Feb 18 05:04:36 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Feb 18 05:04:36 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 e1000e serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support shpchp ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Feb 18 05:04:36 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1 Feb 18 05:04:36 kernel: Call Trace: Feb 18 05:04:36 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Feb 18 05:04:36 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Feb 18 05:04:36 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Feb 18 05:04:36 kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0 Feb 18 05:04:36 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Feb 18 05:04:36 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Feb 18 05:04:36 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Feb 18 05:04:36 kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30 Feb 18 05:04:36 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Feb 18 05:04:36 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Feb 18 05:04:36 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Feb 18 05:04:36 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Feb 18 05:04:36 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Feb 18 05:04:36 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Feb 18 05:04:36 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Feb 18 05:04:36 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Feb 18 05:04:36 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Feb 18 05:04:36 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Feb 18 05:04:36 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Feb 18 05:04:36 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Feb 18 05:04:36 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Feb 18 05:04:36 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Feb 18 05:04:36 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Feb 18 05:04:36 kernel: ---[ end trace 21915186e9d87b29 ]---
modinfo e1000e | grep version version: 3.2.5-k srcversion: 8CCA78B3C15DE6229299348 vermagic: 2.6.32-573.18.1.el6.x86_64 SMP mod_unload modversions
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 03:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
Just noticed that in the trace, it shows an old kernel, so I don't think grub was automatically selecting the latest kernel. Just wondering what process updates the default to be the latest kernel, and if a problem could be an update but grub selecting an older kernel, but other packages updated ?
If your machine is "running Centos 6.6(final)", but you've installed the new kernel and glibc that implies that you are selectively applying updates. The 6.7 point release came out last fall. In addition to the security implications of not fully updating the system you may have missed packages that are impacting networking.
You may want to do a full updating of the system and then see how it acts -- it's hard to debug a system that may have mis-matched pieces.
To see which kernel your grub is set to load by default, look at the grub.conf -- the "default=" line (normally "0") indicates which of the listed kernels will be selected.
If the "default" value isn't "0", and/or the newest kernel isn't the first entry, then you have something mucking with things. Check your /etc/sysconfig/kernel file for starters.
Thanks Richard,
We currently do all security updates at short notice (as opposed to everything), via a script. I've amended the grub config and rebooted to make sure it will reboot into the correct kernel now, and yes /etc/sysconfig/kernel was different to production servers. We may try all packages if it continues to be unstable now and maybe whatever as its on a dev server to test.
Thanks again,
Ian
As Johnny Hughes pointed out last fall:
https://lists.centos.org/pipermail/centos/2015-December/156697.html
selective updating like that is not supported by CentOS or RHEL.
[please don't top post.]
El Viernes 19/02/2016, Richard escribió:
[please don't top post.]
Then please, also trim your mails before posting if possible, so we don't have to scroll several pages just to read a one-or-two-lines reply :)
(Sorry for the off-topic and the nit-picking, not directed at you personally. I hate top posting but I also hate scrolling through lines upon lines of signatures and other irrelevant content, especially annoying when reading a complete thread).
Cheers, and feel free to ignore me :)
On Fri, 2016-02-19 at 11:15 -0300, Ricardo J. Barberis wrote:
Then please, also trim your mails before posting if possible, so we don't have to scroll several pages just to read a one-or-two-lines reply :)
I'm with you on this one!
It so irritates that I've been searching for mail readers that have a plug-in to fold quoted sections - know of any? Haven't found one yet, but I'm still looking.
ak.
El Viernes 19/02/2016, Anthony K escribió:
On Fri, 2016-02-19 at 11:15 -0300, Ricardo J. Barberis wrote:
Then please, also trim your mails before posting if possible, so we don't have to scroll several pages just to read a one-or-two-lines reply :)
I'm with you on this one!
It so irritates that I've been searching for mail readers that have a plug-in to fold quoted sections - know of any? Haven't found one yet, but I'm still looking.
I know it's not for everybody, but kmail has an option for it:
Settings -> Configure Kmail -> Appearance -> Message Window -> Show expand/collapse quote marks
BTW, I'm not so sure about "Appearance", as I have my system configured in spanish, when I switch to english some strings are still in spanish. But you should find it :)
Cheers,
Am 19.02.2016 um 13:47 schrieb Ian B ibrierley@gmail.com:
We currently do all security updates at short notice (as opposed to everything), via a script. I've amended the grub config and rebooted to make sure it will reboot into the correct kernel now, and yes /etc/sysconfig/kernel was different to production servers. We may try all packages if it continues to be unstable now and maybe whatever as its on a dev server to test.
Why being selective about updates (despite the already mentioned implications that obviously where not recognized while doing it)? What is your scenario that requires this? I'm just curious ...
-- LF