Greetings all,
Running CentOS 6.5 x86_64 2.6.32-431.5.1.el6.x86_64. (already booting with the irqpoll option in grub)
Every few months I lose network connectivity and have to restart the server:
in /var/log/messages:
" Mar 7 18:54:21 backup03 kernel: irq 68: nobody cared (try booting with the "irqpoll" option) Mar 7 18:54:21 backup03 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-431.1.2.0.1.el6.x86_64 #1 Mar 7 18:54:21 backup03 kernel: Call Trace: Mar 7 18:54:21 backup03 kernel: <IRQ> [<ffffffff810e8ffb>] ? __report_bad_irq+0x2b/0xa0 Mar 7 18:54:21 backup03 kernel: [<ffffffff810e91fc>] ? note_interrupt+0x18c/0x1d0 Mar 7 18:54:21 backup03 kernel: [<ffffffff810e9845>] ? handle_edge_irq+0xf5/0x180 Mar 7 18:54:21 backup03 kernel: [<ffffffff8100faf9>] ? handle_irq+0x49/0xa0 Mar 7 18:54:21 backup03 kernel: [<ffffffff81530fec>] ? do_IRQ+0x6c/0xf0 Mar 7 18:54:21 backup03 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Mar 7 18:54:21 backup03 kernel: [<ffffffff8107a893>] ? __do_softirq+0x73/0x1e0 Mar 7 18:54:21 backup03 kernel: [<ffffffff810ac9da>] ? tick_program_event+0x2a/0x30 Mar 7 18:54:21 backup03 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Mar 7 18:54:21 backup03 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0 Mar 7 18:54:21 backup03 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90 Mar 7 18:54:21 backup03 kernel: [<ffffffff815310ba>] ? smp_apic_timer_interrupt+0x4a/0x60 Mar 7 18:54:21 backup03 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 Mar 7 18:54:21 backup03 kernel: <EOI> [<ffffffff812e09ce>] ? intel_idle+0xde/0x170 Mar 7 18:54:21 backup03 kernel: [<ffffffff812e09b1>] ? intel_idle+0xc1/0x170 Mar 7 18:54:21 backup03 kernel: [<ffffffff81426707>] ? cpuidle_idle_call+0xa7/0x140 Mar 7 18:54:21 backup03 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Mar 7 18:54:21 backup03 kernel: [<ffffffff81520e2c>] ? start_secondary+0x2ac/0x2ef Mar 7 18:54:21 backup03 kernel: handlers: Mar 7 18:54:21 backup03 kernel: [<ffffffffa015f260>] (e1000_msix_other+0x0/0x1f0 [e1000e]) Mar 7 18:54:21 backup03 kernel: Disabling IRQ #68 "
cat /proc/interrupts | grep 68 68: 2 0 0 0 IR-PCI-MSI-edge eth0
I'm already booting with irqpoll set in grub:
kernel /vmlinuz-2.6.32-431.5.1.el6.x86_64 ro root=UUID=b535f362-3152-4cc1-a9f2-b86f44331510 nomodeset rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto rd_NO_LVM rd_NO_DM irqpoll
Anyone have any ideas or suggestions? Not sure what else I can do here.
Thanks in advance!
-PJF
----- Original Message ----- | Greetings all, | | Running CentOS 6.5 x86_64 2.6.32-431.5.1.el6.x86_64. | (already booting with the irqpoll option in grub) | | Every few months I lose network connectivity and have to restart the | server: | | in /var/log/messages: | | " | Mar 7 18:54:21 backup03 kernel: irq 68: nobody cared (try booting | with the | "irqpoll" option) | Mar 7 18:54:21 backup03 kernel: Pid: 0, comm: swapper Not tainted | 2.6.32-431.1.2.0.1.el6.x86_64 #1 | Mar 7 18:54:21 backup03 kernel: Call Trace: | Mar 7 18:54:21 backup03 kernel: <IRQ> [<ffffffff810e8ffb>] ? | __report_bad_irq+0x2b/0xa0 | Mar 7 18:54:21 backup03 kernel: [<ffffffff810e91fc>] ? | note_interrupt+0x18c/0x1d0 | Mar 7 18:54:21 backup03 kernel: [<ffffffff810e9845>] ? | handle_edge_irq+0xf5/0x180 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8100faf9>] ? | handle_irq+0x49/0xa0 | Mar 7 18:54:21 backup03 kernel: [<ffffffff81530fec>] ? | do_IRQ+0x6c/0xf0 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8100b9d3>] ? | ret_from_intr+0x0/0x11 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8107a893>] ? | __do_softirq+0x73/0x1e0 | Mar 7 18:54:21 backup03 kernel: [<ffffffff810ac9da>] ? | tick_program_event+0x2a/0x30 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8100c30c>] ? | call_softirq+0x1c/0x30 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8100fa75>] ? | do_softirq+0x65/0xa0 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8107a795>] ? | irq_exit+0x85/0x90 | Mar 7 18:54:21 backup03 kernel: [<ffffffff815310ba>] ? | smp_apic_timer_interrupt+0x4a/0x60 | Mar 7 18:54:21 backup03 kernel: [<ffffffff8100bb93>] ? | apic_timer_interrupt+0x13/0x20 | Mar 7 18:54:21 backup03 kernel: <EOI> [<ffffffff812e09ce>] ? | intel_idle+0xde/0x170 | Mar 7 18:54:21 backup03 kernel: [<ffffffff812e09b1>] ? | intel_idle+0xc1/0x170 | Mar 7 18:54:21 backup03 kernel: [<ffffffff81426707>] ? | cpuidle_idle_call+0xa7/0x140 | Mar 7 18:54:21 backup03 kernel: [<ffffffff81009fc6>] ? | cpu_idle+0xb6/0x110 | Mar 7 18:54:21 backup03 kernel: [<ffffffff81520e2c>] ? | start_secondary+0x2ac/0x2ef | Mar 7 18:54:21 backup03 kernel: handlers: | Mar 7 18:54:21 backup03 kernel: [<ffffffffa015f260>] | (e1000_msix_other+0x0/0x1f0 [e1000e]) | Mar 7 18:54:21 backup03 kernel: Disabling IRQ #68 | " | | cat /proc/interrupts | grep 68 | 68: 2 0 0 0 IR-PCI-MSI-edge | eth0 | | I'm already booting with irqpoll set in grub: | | kernel /vmlinuz-2.6.32-431.5.1.el6.x86_64 ro | root=UUID=b535f362-3152-4cc1-a9f2-b86f44331510 nomodeset rd_NO_LUKS | KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD quiet | SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto rd_NO_LVM rd_NO_DM | irqpoll | | Anyone have any ideas or suggestions? Not sure what else I can do | here. | | Thanks in advance! | | -PJF
What kind of machine are you running this on? If you have a BIOS check to see if there is an update available for it. Assuming that you have the latest BIOS on the machine try booting with MSI-X disabled and see if it becomes more stable.
James A. Peltier Manager, IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices
"Around here, however, we don’t look backwards for very long. We KEEP MOVING FORWARD, opening up new doors and doing things because we’re curious and curiosity keeps leading us down new paths." - Walt Disney
On Sat, 8 Mar 2014, James A. Peltier wrote:
| Every few months I lose network connectivity and have to restart the | server:
What kind of machine are you running this on? If you have a BIOS check to see if there is an update available for it. Assuming that you have the latest BIOS on the machine try booting with MSI-X disabled and see if it becomes more stable.
I'll second the suggestion for disabling MSI-X and add one: add pcie_aspm=off to your boot-time kernel options.
You wouldn't happen to be running a SuperMicro mainboard with onboard Intel 82574L NICs, would you? If so, I'll also suggest installing the kmod-e1000e package from elrepo.org, which includes a workaround for the bad PROM that's involved.
James & Paul,
Thank you for your suggestions, I will give them a try.
Paul, it is indeed a SuperMicro with onboard Intel 82574L's. So I will try the kmod-e1000e package as well.
Very grateful for your help to and to this list.
Thanks, PJF
On Sun, Mar 9, 2014 at 8:38 AM, Paul Heinlein heinlein@madboa.com wrote:
On Sat, 8 Mar 2014, James A. Peltier wrote:
| Every few months I lose network connectivity and have to restart the
| server:
What kind of machine are you running this on? If you have a BIOS check to see if there is an update available for it. Assuming that you have the latest BIOS on the machine try booting with MSI-X disabled and see if it becomes more stable.
I'll second the suggestion for disabling MSI-X and add one: add pcie_aspm=off to your boot-time kernel options.
You wouldn't happen to be running a SuperMicro mainboard with onboard Intel 82574L NICs, would you? If so, I'll also suggest installing the kmod-e1000e package from elrepo.org, which includes a workaround for the bad PROM that's involved.
-- Paul Heinlein heinlein@madboa.com 45°38' N, 122°6' W _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sun, 9 Mar 2014, P J wrote:
James & Paul,
Thank you for your suggestions, I will give them a try.
Paul, it is indeed a SuperMicro with onboard Intel 82574L's. So I will try the kmod-e1000e package as well.
Don't neglect the pcie_aspm=off in grub.conf; that and the elrepo package made the problem go away for me.