i add the "irqpoll" option to kernel line. after the system booted i log into X with tty1 also login as root with the command " tail -f /var/log/messages" for monitoring . After 10 to 20 minutes i got the same message again but this time immediately the system froze up !
i tried it 5 times now and each time is the same result.i also noticed that the system was behaving very slowly till the message came up like usually folder copy operation - i usually get 70-90MB/s according to vmstat 1 comand, but with irqpoll the maximum i noticed was like 10MB/s.
Am i doing something wrong ?
On Fri, Sep 19, 2008 at 06:58:21PM +0530, partha chowdhury wrote:
i add the "irqpoll" option to kernel line. after the system booted i log into X with tty1 also login as root with the command " tail -f /var/log/messages" for monitoring . After 10 to 20 minutes i got the same message again but this time immediately the system froze up !
i tried it 5 times now and each time is the same result.i also noticed that the system was behaving very slowly till the message came up like usually folder copy operation - i usually get 70-90MB/s according to vmstat 1 comand, but with irqpoll the maximum i noticed was like 10MB/s.
Am i doing something wrong ?
I'm seeing this same problem on IBM HS21-8853 blades.
It _seems_ like a bug in IRQ routing in BIOS/motherboard.. ACPI issue?
With IBM blades I'm able to fix the problem by generating the initrd image with "--without-usb" switch.. this delays the USB module initialization/loading, somehow fixing the problem.. giving different IRQ to USB controller.
-- Pasi
On Mon, 2008-09-22 at 11:43 +0300, Pasi Kärkkäinen wrote:
I'm seeing this same problem on IBM HS21-8853 blades.
It _seems_ like a bug in IRQ routing in BIOS/motherboard.. ACPI issue?
With IBM blades I'm able to fix the problem by generating the initrd image with "--without-usb" switch.. this delays the USB module initialization/loading, somehow fixing the problem.. giving different IRQ to USB controller.
-- Pasi _______________________________________________
i just tried your fix and it solved the problem. now the usb drive does not disappear any more. Thank you !
On Tue, 2008-09-23 at 00:27 +0530, partha chowdhury wrote:
On Mon, 2008-09-22 at 11:43 +0300, Pasi Kärkkäinen wrote:
I'm seeing this same problem on IBM HS21-8853 blades.
It _seems_ like a bug in IRQ routing in BIOS/motherboard.. ACPI issue?
With IBM blades I'm able to fix the problem by generating the initrd image with "--without-usb" switch.. this delays the USB module initialization/loading, somehow fixing the problem.. giving different IRQ to USB controller.
-- Pasi _______________________________________________
i just tried your fix and it solved the problem. now the usb drive does not disappear any more. Thank you
well I spoke too soon .Now the error message i am getting :
Sep 23 00:37:46 station2 kernel: [<c044e6fa>] __report_bad_irq +0x2b/0x69 Sep 23 00:37:46 station2 kernel: [<c044e8e7>] note_interrupt +0x1af/0x1e8 Sep 23 00:37:46 station2 kernel: [<c057afb4>] usb_hcd_irq+0x23/0x50 Sep 23 00:37:46 station2 kernel: [<c044df27>] handle_IRQ_event +0x23/0x49 Sep 23 00:37:46 station2 kernel: [<c044dfe8>] __do_IRQ+0x9b/0xd6 Sep 23 00:37:46 station2 kernel: [<c04073f4>] do_IRQ+0x93/0xae Sep 23 00:37:46 station2 kernel: [<c040592e>] common_interrupt +0x1a/0x20 Sep 23 00:37:46 station2 kernel: [<c0403b98>] default_idle+0x0/0x59 Sep 23 00:37:46 station2 kernel: [<c0403bc9>] default_idle+0x31/0x59 Sep 23 00:37:46 station2 kernel: [<c0403c90>] cpu_idle+0x9f/0xb9 Sep 23 00:37:46 station2 kernel: [<c06ed9ee>] start_kernel+0x379/0x380 Sep 23 00:37:46 station2 kernel: ======================= Sep 23 00:37:46 station2 kernel: handlers: Sep 23 00:37:46 station2 kernel: [<c057af91>] (usb_hcd_irq+0x0/0x50) Sep 23 00:37:46 station2 kernel: Disabling IRQ #58
and the same thing happens i.e i am unable to remount the usb drive.
can someone tell me if its a centos/rhel bug and if it is how and where i should file a bug report ?
On Mon, 2008-09-22 at 22:31 -0700, nate wrote:
partha chowdhury wrote:
can someone tell me if its a centos/rhel bug and if it is how and where i should file a bug report ?
I'd say it's a hardware problem rather than a software bug at this point.
nate
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
well i managed to fix the problem after an intensive search through the forum and adding the noirqdebug option to the kernel line.
On Tue, Sep 23, 2008 at 8:02 PM, partha chowdhury kira.laucas@gmail.com wrote:
well i managed to fix the problem after an intensive search through the forum and adding the noirqdebug option to the kernel line.
Are you /sure/ this fixes the problem? Your last fix didn't work out so well, so I'm just curious, not criticizing....
mhr
MHR wrote:
On Tue, Sep 23, 2008 at 8:02 PM, partha chowdhury kira.laucas@gmail.com wrote:
well i managed to fix the problem after an intensive search through the forum and adding the noirqdebug option to the kernel line.
Are you /sure/ this fixes the problem? Your last fix didn't work out so well, so I'm just curious, not criticizing....
From what I've read I'm pretty confident it won't fix the
problem it only masks it
http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re1...
By default, the kernel attempts to detect and disable unhandled interrupt sources because they can cause problems with the responsiveness of the rest of the kernel if left unchecked. This option will disable this logic.
--
So it sounds like linux is saying the hardware is faulty and is disabling it pro-actively before bad things can happen, disabling the code that detects bad hardware and recovers from it is just asking for trouble IMO.
Replace the hardware, get better quality stuff. Since this is USB, get a PCI USB expansion board see if that helps. About a year ago I bought a USB 2.0 PCI card for one of my older systems, was about $20 I think.
nate
On Wed, Sep 24, 2008 at 5:39 PM, nate centos@linuxpowered.net wrote:
http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re1...
By default, the kernel attempts to detect and disable unhandled interrupt sources because they can cause problems with the responsiveness of the rest of the kernel if left unchecked. This option will disable this logic.
just for curiosity, is this option removed in the latest 2.6.26.5 kernel ? because i experimented with compiling a custom kernel and did not ever receive the message . anyway i am running centos without any problem now and i am glad about it.
Replace the hardware, get better quality stuff. Since this is USB, get a PCI USB expansion board see if that helps. About a year ago I bought a USB 2.0 PCI card for one of my older systems, was about $20 I think.
now that you have mentioned it, i have noticed recently that my desktop
motherboard usb port has gone slower. i mean previously i used to get 28-30 MB/s transfer speed with my external usb drive. but now the max i get is 10MB/s . i have tested the external drive on my friend's laptop and to my surprise it transferred with 25MB/s ! is it any indication of any potentially disastrous hardware failure issue ?
for information my hardware is : 00:00.0 RAM memory: nVidia Corporation MCP67 Memory Controller (rev a2) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0 Capabilities: [44] HyperTransport: Slave or Primary Interface Capabilities: [dc] HyperTransport: MSI Mapping
00:01.0 ISA bridge: nVidia Corporation MCP67 ISA Bridge (rev a2) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0 I/O ports at 0900 [size=256]
00:01.1 SMBus: nVidia Corporation MCP67 SMBus (rev a2) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: 66MHz, fast devsel, IRQ 10 I/O ports at dc00 [size=64] I/O ports at 0600 [size=64] I/O ports at 0700 [size=64] Capabilities: [44] Power Management version 2
00:02.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 58 Memory at feaff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2
00:02.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 233 Memory at feafec00 (32-bit, non-prefetchable) [size=256] Capabilities: [44] Debug port Capabilities: [80] Power Management version 2
00:04.0 USB Controller: nVidia Corporation MCP67 OHCI USB 1.1 Controller (rev a2) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 66 Memory at feafd000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2
00:04.1 USB Controller: nVidia Corporation MCP67 EHCI USB 2.0 Controller (rev a2) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 50 Memory at feafe800 (32-bit, non-prefetchable) [size=256] Capabilities: [44] Debug port Capabilities: [80] Power Management version 2
00:08.0 PCI bridge: nVidia Corporation MCP67 PCI Bridge (rev a2) (prog-if 01 [Subtractive decode]) Flags: bus master, 66MHz, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: 0000e000-0000efff Memory behind bridge: feb00000-febfffff Capabilities: [b8] #0d [0000] Capabilities: [8c] HyperTransport: MSI Mapping
00:09.0 IDE interface: nVidia Corporation MCP67 AHCI Controller (rev a2) (prog-if 85 [Master SecO PriO]) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 233 I/O ports at d480 [size=8] I/O ports at d400 [size=4] I/O ports at d080 [size=8] I/O ports at d000 [size=4] I/O ports at cc00 [size=16] Memory at feafa000 (32-bit, non-prefetchable) [size=8K] Capabilities: [44] Power Management version 2 Capabilities: [8c] #12 [0010]
00:0a.0 Ethernet controller: nVidia Corporation MCP67 Ethernet (rev a2) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 90 Memory at feafc000 (32-bit, non-prefetchable) [size=4K] I/O ports at c880 [size=8] Memory at feafe400 (32-bit, non-prefetchable) [size=256] Memory at feafe000 (32-bit, non-prefetchable) [size=16] Capabilities: [44] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable+ Capabilities: [6c] HyperTransport: MSI Mapping
00:0b.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:0c.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:0d.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:0e.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:0f.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:10.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=07, subordinate=07, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:11.0 PCI bridge: nVidia Corporation MCP67 PCI Express Bridge (rev a2) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=08, subordinate=08, sec-latency=0 Capabilities: [40] #0d [0000] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [60] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0
00:12.0 VGA compatible controller: nVidia Corporation GeForce 7050 PV / nForce 630a (rev a2) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Unknown device 82b3 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 58 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at d0000000 (64-bit, prefetchable) [size=256M] Memory at fc000000 (64-bit, non-prefetchable) [size=16M] [virtual] Expansion ROM at feac0000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration Flags: fast devsel Capabilities: [80] HyperTransport: Host or Secondary Interface
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map Flags: fast devsel
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller Flags: fast devsel
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Flags: fast devsel Capabilities: [f0] #0f [0010]
01:06.0 Multimedia audio controller: Creative Labs [SB Live! Value] EMU10k1X Subsystem: Creative Labs Unknown device 1001 Flags: bus master, medium devsel, latency 64, IRQ 82 I/O ports at ec00 [size=32] Capabilities: [dc] Power Management version 2
01:07.0 Ethernet controller: VIA Technologies, Inc. VT6105 [Rhine-III] (rev 8b) Subsystem: D-Link System Inc Unknown device 1405 Flags: bus master, medium devsel, latency 64, IRQ 74 I/O ports at e800 [size=256] Memory at febffc00 (32-bit, non-prefetchable) [size=256] Capabilities: [44] Power Management version 2
kira laucas wrote:
now that you have mentioned it, i have noticed recently that my desktop motherboard usb port has gone slower. i mean previously i used to get 28-30 MB/s transfer speed with my external usb drive. but now the max i get is 10MB/s . i have tested the external drive on my friend's laptop and to my surprise it transferred with 25MB/s ! is it any indication of any potentially disastrous hardware failure issue ?
I wouldn't think the system is about to fail if it's just going slower. If there are specific error messages that point to it's failing then maybe. Errors quoted earlier just seem like bad hardware(perhaps poorly designed or built, rather than hardware that is physically failing).
Only thing I can suggest is to just verify that the drive is detected as USB 2.0 via lsusb -v
e.g. Bus 004 Device 020: ID 1058:0702 Western Digital Technologies, Inc. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 [..] iManufacturer 1 Western Digital iProduct 2 External HDD
I believe the 2.00 indicates USB 2.0, I see several other devices on my USB that are marked as 1.x
If the device is bus powered, make sure it is getting enough power, some of my bus powered disks I have to use a USB Y cable to plug the drives into two ports simultaneously(one for power+data, the other for power only).
If you configured your system's kernel to ignore the irq errors as the other poster did(I think your a different poster..didn't check), you really should remove that option and enable the checking again, and try a PCI USB expansion card instead and see if that helps.
nate
On Wed, 2008-09-24 at 07:09 -0700, nate wrote:
I wouldn't think the system is about to fail if it's just going slower. If there are specific error messages that point to it's failing then maybe. Errors quoted earlier just seem like bad hardware(perhaps poorly designed or built, rather than hardware that is physically failing).
well i myself have assembled this box. did i do something wrong because it is running fine for last 10 months without any hiccups and other than this minor issue no problem at all.
Only thing I can suggest is to just verify that the drive is detected as USB 2.0 via lsusb -v
e.g. Bus 004 Device 020: ID 1058:0702 Western Digital Technologies, Inc. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 [..] iManufacturer 1 Western Digital iProduct 2 External HDD
I believe the 2.00 indicates USB 2.0, I see several other devices on my USB that are marked as 1.x
here is my output
Bus 002 Device 002: ID 05e3:0702 Genesys Logic, Inc. USB 2.0 IDE Adapter Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x05e3 Genesys Logic, Inc. idProduct 0x0702 USB 2.0 IDE Adapter bcdDevice 0.33 iManufacturer 0 iProduct 1 USB TO IDE iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 32 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xc0 Self Powered MaxPower 96mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk (Zip) iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 1
you can see it is detected as usb2.0.
If you configured your system's kernel to ignore the irq errors as the other poster did(I think your a different poster..didn't check), you really should remove that option and enable the checking again, and try a PCI USB expansion card instead and see if that helps.
actually it was i who sent it. i sent it through gmail so it did not insert my name just my email id. usually i send through evolution. i did not know this issue. apology for the unintentional mixups.
i want to know one thing - the hardware failing logic you earlier spoke of - is that disabled or removed in latest 2.6.26.5 kernel because i experimented with custom compiling that kernel and did not receive any error message whatsoever.
partha chowdhury wrote:
i want to know one thing - the hardware failing logic you earlier spoke of - is that disabled or removed in latest 2.6.26.5 kernel because i experimented with custom compiling that kernel and did not receive any error message whatsoever.
Download the kernel source and take a peek. I had never heard of that option until you mentioned it
I would suggest something like grep -nri noirqdebug *
from within the extracted kernel source tree.
If you don't see any matches then it's probably not there.
nate
on 9-24-2008 6:42 AM kira laucas spake the following:
On Wed, Sep 24, 2008 at 5:39 PM, nate <centos@linuxpowered.net mailto:centos@linuxpowered.net> wrote:
http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re18.html By default, the kernel attempts to detect and disable unhandled interrupt sources because they can cause problems with the responsiveness of the rest of the kernel if left unchecked. This option will disable this logic. just for curiosity, is this option removed in the latest 2.6.26.5 <http://2.6.26.5> kernel ? because i experimented with compiling a custom kernel and did not ever receive the message . anyway i am running centos without any problem now and i am glad about it. Replace the hardware, get better quality stuff. Since this is USB, get a PCI USB expansion board see if that helps. About a year ago I bought a USB 2.0 PCI card for one of my older systems, was about $20 I think.
now that you have mentioned it, i have noticed recently that my desktop motherboard usb port has gone slower. i mean previously i used to get 28-30 MB/s transfer speed with my external usb drive. but now the max i get is 10MB/s . i have tested the external drive on my friend's laptop and to my surprise it transferred with 25MB/s ! is it any indication of any potentially disastrous hardware failure issue ?
USB is interrupt driven. If the system has trouble responding to the interrupts quick enough, it will slow down the transfers. Too bad the USB designers didn't make it DMA driven like firewire. But then it was designed to be cheaper.
MHR wrote:
On Tue, Sep 23, 2008 at 8:02 PM, partha chowdhury kira.laucas@gmail.com wrote:
well i managed to fix the problem after an intensive search through the forum and adding the noirqdebug option to the kernel line.
Are you /sure/ this fixes the problem? Your last fix didn't work out so well, so I'm just curious, not criticizing....
mhr _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
well it is running close for a day now and the message has not appeared yet and all the usb drives are working as usual. so far so good ,keeping fingers crossed !