Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
----------------------------------- /var/log/messages ----------------------------------------------------------------- Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex -----------------------------------------------------------------------------------------------------------------------------------
Does anyone know that problem?
System is Centos 6.3 Kernel Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks Hartmut
On Mar 19, 2013, at 9:32 AM, Woehrle Hartmut SBB CFF FFS (Extern) hartmut.woehrle@sbb.ch wrote:
Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
If that's a DL360 G7 server, make sure you've applied all of the latest firmware patches from HP on it. The G7 version has been almost notorious for firmware issues with drive controllers, ethernet interfaces, etc.
Nate
On Mar 19, 2013, at 9:32 AM, Woehrle Hartmut SBB CFF FFS (Extern) hartmut.woehrle@sbb.ch wrote:
Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
If that's a DL360 G7 server, make sure you've applied all of the latest firmware patches from HP on it. The G7 version has been almost notorious for firmware issues with drive controllers, ethernet interfaces, etc.
Nate
Hello Nate
It is a G6 Server and the firmware is more or less the latest version:
# bash CP017428.scexe -c MAC PCI-ID NIC 18A90576C820 14E4-1639-103C-7055 HP NC382i DP Multifunction Gigabit Server Adapter
(Installed) (Available) Interface Image Version Image Version eth0 ---------------------------------------------------------------------- BC 5.2.3 BC 5.2.3 iSCSI 4.2.10 iSCSI 7.4.2 <<<<<<<<<<<<< I don't use iSCSI at this maschine NCSI 2.0.6 NCSI 2.0.12
Hartmut
What's the irq number you can find for the device? You may have to find the driver development guide to figure out what the debug message says.
Just the first line points out there is no irq for the device. You can check it in /proc/interrupts, then find a match in /proc/irq/
------------ Banyan He Blog: http://www.rootong.com Email: banyan@rootong.com
On 3/19/2013 11:32 PM, Woehrle Hartmut SBB CFF FFS (Extern) wrote:
Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
----------------------------------- /var/log/messages ----------------------------------------------------------------- Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex
Does anyone know that problem?
System is Centos 6.3 Kernel Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks Hartmut
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos .
How often are you getting these crashes ?
I had simular problem on my HP DL380 G7 server.
I disabled Active State PowerManagement on the PCI-E express.
Try it.
Add pcie_aspm=off as optional boot option.
Best regards,
Svavar O Reykjavik - Iceland
On 19.3.2013, at 15:32, Woehrle Hartmut SBB CFF FFS (Extern) wrote:
Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
----------------------------------- /var/log/messages ----------------------------------------------------------------- Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex
Does anyone know that problem?
System is Centos 6.3 Kernel Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks Hartmut
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello Svavar
This was the first time that this problem occurred - with 60 Servers and about half a year of Centos 6 (5 before). But because the interfaces have a permanent load - really 24x7 - problems with power management would be a disaster. I will try to switch off.
Thanks Hartmut
How often are you getting these crashes ?
I had simular problem on my HP DL380 G7 server.
I disabled Active State PowerManagement on the PCI-E express.
Try it.
Add pcie_aspm=off as optional boot option.
Best regards,
Svavar O Reykjavik - Iceland
On 19.3.2013, at 15:32, Woehrle Hartmut SBB CFF FFS (Extern) wrote:
Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
----------------------------------- /var/log/messages
Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex
Does anyone know that problem?
System is Centos 6.3 Kernel Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks Hartmut
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
After you have tried the pcie_aspm boot option, also try :
echo performance > /sys/module/pcie_aspm/parameters/policy This will disable ASPM on PCIe and operate with maximum performance.
This is what I use today on the DL380 G7.
On 25.3.2013, at 09:06, Woehrle Hartmut SBB CFF FFS (Extern) wrote:
Hello Svavar
This was the first time that this problem occurred - with 60 Servers and about half a year of Centos 6 (5 before). But because the interfaces have a permanent load - really 24x7 - problems with power management would be a disaster. I will try to switch off.
Thanks Hartmut
How often are you getting these crashes ?
I had simular problem on my HP DL380 G7 server.
I disabled Active State PowerManagement on the PCI-E express.
Try it.
Add pcie_aspm=off as optional boot option.
Best regards,
Svavar O Reykjavik - Iceland
On 19.3.2013, at 15:32, Woehrle Hartmut SBB CFF FFS (Extern) wrote:
Hello Mailing List
I got a severe network error message at a HP DL360 Server. The kernel log says:
----------------------------------- /var/log/messages
Mar 19 15:45:06 server kernel: do_IRQ: 2.168 No irq handler for vector (irq -1) Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: intr_sem[0] PCI_CMD[00100446] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PCI_PM[19002108] PCI_MISC_CFG[92000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000006] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: PBA[00000000] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- start MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: pc[0800adec] pc[0800aeb0] instr[8fb10014] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: shmem states: Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: drv_mb[0103000f] fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[0000432b] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: DEBUG: 0x3fc[0000ffff] Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: <--- end MCP states dump ---> Mar 19 15:45:17 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Down Mar 19 15:45:20 server kernel: bnx2 0000:02:00.1: eth1: NIC Copper Link is Up, 1000 Mbps full duplex
Does anyone know that problem?
System is Centos 6.3 Kernel Linux server 2.6.32-279.5.2.el6.centos.plus.x86_64 #1 SMP Fri Aug 24 00:25:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks Hartmut
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos