This is a return to an issue I first raised back in June. We had a similar occurrence in September while I was away and so I am revisiting the entire matter.
Steve Clark on 6 Jun 16:02 2014 wrote:
Hi,
We ran into this problem also - the interface would disappear. There is newer e1000e driver that fixes it or you could add pcie_aspm=off to your kernel command line.
HTH, Steve
I have run into other reports of similar occurrences and some of these refer to this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=632650
However, that report is closed as being a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=562273
Which is not available to viewing by the great unwashed.
Nonetheless, following the discussion thread in the bug report that I can view it appears that this issue was supposedly resolved sometime in late 2012.
From what I can gather the fix was to disable ASPM L1 for this model adaptor
in the e1000e driver module.
* Upstream commit d4a4206ebbaf48b55803a7eb34e330530d83a889 - e1000e: Disable ASPM L1 on 82574
However, when I run lspci -vvv on the host that exhibited the problem I see this:
. . . 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Super Micro Computer Inc Device 10d3 Physical Slot: 0-2 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at feae0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at ec00 [size=32] Region 3: Memory at feadc000 (32-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot-
############ LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
############
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-25-90-ff-ff-61-74-c1 Kernel driver in use: e1000e Kernel modules: e1000e . . .
lsmod . . . e1000e 267701 0 . . .
The host is running CentOS-6.5 with all updates applied to date. My question is: Has this issue been addressed in the official e1000e module or not? if not then does the recommendation to "add pcie_aspm=off to your kernel command line" hold?