a previously rock solid reliable server of mine crashed last night, the server was still running but eth0, a Intel 82574L using the e1000e driver, went down. The server has a Supermicro X8DTE-F (dual Xeon X5650, yada yada). server is a drbd master, so that was the first thing to notice network issues. Just a couple days ago I ran yum update to the latest, I do this about once a month.
/var/log/messages logged...
(prior to this was nothing but normal smbd complaining about CUPS not configured).
May 9 22:30:21 sg1 kernel: block drbd0: PingAck did not arrive in time. May 9 22:30:21 sg1 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) May 9 22:30:21 sg1 kernel: block drbd0: asender terminated May 9 22:30:21 sg1 kernel: block drbd0: Terminating drbd0_asender May 9 22:30:22 sg1 kernel: block drbd0: new current UUID BC856D7A6F94F041:237F4033E81B62DF:1E248D699B6793A9:1E238D699B6793A9 May 9 22:30:22 sg1 kernel: block drbd0: Connection closed May 9 22:30:22 sg1 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) May 9 22:30:22 sg1 kernel: block drbd0: receiver terminated May 9 22:30:22 sg1 kernel: block drbd0: Restarting drbd0_receiver May 9 22:30:22 sg1 kernel: block drbd0: receiver (re)started May 9 22:30:22 sg1 kernel: block drbd0: conn( Unconnected -> WFConnection ) May 9 22:30:34 sg1 kernel: ------------[ cut here ]------------ May 9 22:30:34 sg1 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted) May 9 22:30:34 sg1 kernel: Hardware name: ISS3500 May 9 22:30:34 sg1 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out May 9 22:30:34 sg1 kernel: Modules linked in: drbd(U) nfsd max6650 coretemp adm1021 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 xfs exportfs microcode iTCO_wdt iTCO_vendor_support joydev serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e(U) ptp pps_core ioatdma dca i7core_edac edac_core ses enclosure sg ext4 jbd2 mbcache sd_mod crc_t10dif ahci megaraid_sas mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] May 9 22:30:34 sg1 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-573.22.1.el6.x86_64 #1 May 9 22:30:34 sg1 kernel: Call Trace: May 9 22:30:34 sg1 kernel: <IRQ> [<ffffffff81077821>] ? warn_slowpath_common+0x91/0xe0 May 9 22:30:34 sg1 kernel: [<ffffffff81077926>] ? warn_slowpath_fmt+0x46/0x60 May 9 22:30:34 sg1 kernel: [<ffffffff8148d64b>] ? dev_watchdog+0x26b/0x280 May 9 22:30:34 sg1 kernel: [<ffffffff8109aded>] ? insert_work+0x6d/0xb0 May 9 22:30:34 sg1 kernel: [<ffffffff81089bd5>] ? internal_add_timer+0xb5/0x110 May 9 22:30:34 sg1 kernel: [<ffffffff8148d3e0>] ? dev_watchdog+0x0/0x280 May 9 22:30:34 sg1 kernel: [<ffffffff8108a867>] ? run_timer_softirq+0x197/0x340 May 9 22:30:34 sg1 kernel: [<ffffffff8103579d>] ? lapic_next_event+0x1d/0x30 May 9 22:30:34 sg1 kernel: [<ffffffff81080361>] ? __do_softirq+0xc1/0x1e0 May 9 22:30:34 sg1 kernel: [<ffffffff810b322f>] ? tick_program_event+0x2f/0x40 May 9 22:30:34 sg1 kernel: [<ffffffff8100c38c>] ? call_softirq+0x1c/0x30 May 9 22:30:34 sg1 kernel: [<ffffffff8100fc25>] ? do_softirq+0x65/0xa0 May 9 22:30:34 sg1 kernel: [<ffffffff81080215>] ? irq_exit+0x85/0x90 May 9 22:30:34 sg1 kernel: [<ffffffff815435ba>] ? smp_apic_timer_interrupt+0x4a/0x60 May 9 22:30:34 sg1 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 May 9 22:30:34 sg1 kernel: <EOI> [<ffffffff812f1a5e>] ? intel_idle+0xfe/0x1b0 May 9 22:30:34 sg1 kernel: [<ffffffff812f1a41>] ? intel_idle+0xe1/0x1b0 May 9 22:30:34 sg1 kernel: [<ffffffff8143413a>] ? cpuidle_idle_call+0x7a/0xe0 May 9 22:30:34 sg1 kernel: [<ffffffff81009fe6>] ? cpu_idle+0xb6/0x110 May 9 22:30:34 sg1 kernel: [<ffffffff81532912>] ? start_secondary+0x2c0/0x316 May 9 22:30:34 sg1 kernel: ---[ end trace 883800817e091e53 ]--- May 9 22:30:34 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly May 9 22:30:35 sg1 abrt-dump-oops: Reported 1 kernel oopses to Abrt May 9 22:30:35 sg1 abrtd: Directory 'oops-2016-05-09-22:30:35-8763-1' creation detected May 9 22:30:38 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:30:42 sg1 kernel: Bridge firewalling registered May 9 22:31:27 sg1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team May 9 22:31:32 sg1 abrtd: Can't find a meaningful backtrace for hashing in '.' May 9 22:31:32 sg1 abrtd: Preserving oops '.' because DropNotReportableOopses is '(not set)' May 9 22:31:32 sg1 abrtd: Looking for kernel package May 9 22:31:32 sg1 abrtd: Kernel package kernel-2.6.32-573.22.1.el6.x86_64 found May 9 22:31:33 sg1 abrtd: New problem directory /var/spool/abrt/oops-2016-05-09-22:30:35-8763-1, processing May 9 22:31:33 sg1 abrtd: Sending an email... May 9 22:31:34 sg1 abrtd: Email was sent to: root@localhost May 9 22:32:25 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:32:30 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:34:55 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly May 9 22:34:59 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:37:25 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly May 9 22:37:30 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:39:50 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly May 9 22:39:55 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:41:30 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly May 9 22:41:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:44:00 sg1 kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly May 9 22:44:05 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:46:28 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:46:33 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:50:05 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:50:09 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:52:56 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:53:01 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:55:30 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:55:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:59:17 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:59:22 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:01:45 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:01:50 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:05:02 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:05:07 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:07:19 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:07:23 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:09:34 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:09:38 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:11:47 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:11:52 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:14:27 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:14:31 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:16:38 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:16:42 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:19:08 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:19:12 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:22:18 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:22:22 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:26:52 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:26:57 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:31:24 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:31:29 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:33:43 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:33:47 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:36:30 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:36:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:39:45 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:39:50 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:41:58 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:42:03 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:45:04 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:45:08 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:47:19 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:47:24 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:52:06 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:52:11 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:55:05 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:55:09 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 23:57:31 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 23:57:36 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx (repeating endlessly til I forced the reboot this morning)
John R Pierce wrote:
a previously rock solid reliable server of mine crashed last night, the server was still running but eth0, a Intel 82574L using the e1000e driver, went down. The server has a Supermicro X8DTE-F (dual Xeon X5650, yada yada). server is a drbd master, so that was the first thing to notice network issues. Just a couple days ago I ran yum update to the latest, I do this about once a month.
/var/log/messages logged...
<snip>
(prior to this was nothing but normal smbd complaining about CUPS not configured). Duplex, Flow Control: Rx/Tx May 9 22:52:56 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:53:01 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx May 9 22:55:30 sg1 kernel: e1000e: eth0 NIC Link is Down May 9 22:55:35 sg1 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
<snip> This is going to sound really stupid, but consider replacing the patch cord. If that doesn't work... should I assume that this m/b has at least two embedded NIC? If so, try using the other NIC.
mark "I haven't done both of those in the last few months, no...."