This is a return to an issue I first raised back in June. We had a similar occurrence in September while I was away and so I am revisiting the entire matter.
Steve Clark on 6 Jun 16:02 2014 wrote:
Hi,
We ran into this problem also - the interface would disappear. There is newer e1000e driver that fixes it or you could add pcie_aspm=off to your kernel command line.
HTH, Steve
I have run into other reports of similar occurrences and some of these refer to this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=632650
However, that report is closed as being a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=562273
Which is not available to viewing by the great unwashed.
Nonetheless, following the discussion thread in the bug report that I can view it appears that this issue was supposedly resolved sometime in late 2012.
From what I can gather the fix was to disable ASPM L1 for this model adaptor
in the e1000e driver module.
* Upstream commit d4a4206ebbaf48b55803a7eb34e330530d83a889 - e1000e: Disable ASPM L1 on 82574
However, when I run lspci -vvv on the host that exhibited the problem I see this:
. . . 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Super Micro Computer Inc Device 10d3 Physical Slot: 0-2 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at feae0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at ec00 [size=32] Region 3: Memory at feadc000 (32-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot-
############ LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
############
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-25-90-ff-ff-61-74-c1 Kernel driver in use: e1000e Kernel modules: e1000e . . .
lsmod . . . e1000e 267701 0 . . .
The host is running CentOS-6.5 with all updates applied to date. My question is: Has this issue been addressed in the official e1000e module or not? if not then does the recommendation to "add pcie_aspm=off to your kernel command line" hold?
On Wed, Oct 15, 2014 at 8:41 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:
This is a return to an issue I first raised back in June. We had a similar occurrence in September while I was away and so I am revisiting the entire matter.
Steve Clark on 6 Jun 16:02 2014 wrote:
Hi,
We ran into this problem also - the interface would disappear. There is newer e1000e driver that fixes it or you could add pcie_aspm=off to your kernel command line.
HTH, Steve
I have run into other reports of similar occurrences and some of these refer to this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=632650
I'm the one who did the submission. Some of my comments (which I thought were helpful) have been hidden by Red Hat.
However, that report is closed as being a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=562273
Which is not available to viewing by the great unwashed.
I don't have access, either.
The host is running CentOS-6.5 with all updates applied to date. My question is: Has this issue been addressed in the official e1000e module or not? if not then does the recommendation to "add pcie_aspm=off to your kernel command line" hold?
My suggestion for you is to give ELRepo's kmod-e1000e a try. It has the latest version from Intel (3.1.0.2) as opposed to the version in the EL kernels (2.3.2-k). There are known cases in which a later version resolved issues.
Akemi
On 16-10-2014 13:47, Akemi Yagi wrote:
On Wed, Oct 15, 2014 at 8:41 AM, James B. Byrne byrnejb@harte-lyne.ca wrote:
This is a return to an issue I first raised back in June. We had a similar occurrence in September while I was away and so I am revisiting the entire matter.
Steve Clark on 6 Jun 16:02 2014 wrote:
Hi,
We ran into this problem also - the interface would disappear. There is newer e1000e driver that fixes it or you could add pcie_aspm=off to your kernel command line.
HTH, Steve
I have run into other reports of similar occurrences and some of these refer to this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=632650
I'm the one who did the submission. Some of my comments (which I thought were helpful) have been hidden by Red Hat.
However, that report is closed as being a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=562273
Which is not available to viewing by the great unwashed.
I don't have access, either.
The host is running CentOS-6.5 with all updates applied to date. My question is: Has this issue been addressed in the official e1000e module or not? if not then does the recommendation to "add pcie_aspm=off to your kernel command line" hold?
My suggestion for you is to give ELRepo's kmod-e1000e a try. It has the latest version from Intel (3.1.0.2) as opposed to the version in the EL kernels (2.3.2-k). There are known cases in which a later version resolved issues.
Both BZs above are RHEL 5 specific, being 562273 a "driver update" one. Did you report this against any RHEL6 too?
Marcelo
On Tue, Oct 21, 2014 at 11:02 AM, Marcelo Ricardo Leitner marcelo.leitner@gmail.com wrote:
On 16-10-2014 13:47, Akemi Yagi wrote:
I'm the one who did the submission. Some of my comments (which I thought were helpful) have been hidden by Red Hat.
However, that report is closed as being a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=562273
Which is not available to viewing by the great unwashed.
I don't have access, either.
The host is running CentOS-6.5 with all updates applied to date. My question is: Has this issue been addressed in the official e1000e module or not? if not then does the recommendation to "add pcie_aspm=off to your kernel command line" hold?
My suggestion for you is to give ELRepo's kmod-e1000e a try. It has the latest version from Intel (3.1.0.2) as opposed to the version in the EL kernels (2.3.2-k). There are known cases in which a later version resolved issues.
Both BZs above are RHEL 5 specific, being 562273 a "driver update" one. Did you report this against any RHEL6 too?
Marcelo
The e1000e bug report against EL6 is in this CentOS bug tracker and you can find all the details:
http://bugs.centos.org/view.php?id=6810
RH bugzilla is here but it is private:
https://bugzilla.redhat.com/show_bug.cgi?id=1038754
Here again, I recommend use of ELRepo's kmod-e1000e package. It is possible that the driver in the upcoming CentOS 6.6 fixes the problem.
Akemi