This is a repost of sorts, and for that I am sorry; I do not think my original posting subject was very clear, and I have more data about the problem.
I'm experiencing lots of kernel errors when reading or writing to a disk that is part of an mdadm softraid-5 array. Since originally detecting this problem, I have isolated it to one disk, but I'm not sure what the cause of the error is. I have torn down the array and have been testing the actual device now.
I tested the drive several times using smartctl and the tests came back OK / PASSED each time, and no kernel errors were logged.
I tested the drive with the "badblocks" command, and there were no problems reported, and no kernel errors were logged.
As soon as I try to write any substantial data to the device, kernel errors start pouring in:
# dd if=/dev/zero of=/dev/sdg oflag=sync bs=8M count=100
The dd command does its thing, and eventually finishes without error, but at a very slow speed, most of my drives run this command at 65-75mb/sec, and sdg completes it at 13mb/sec.
Apr 18 01:10:00 xenmaster kernel: BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: Call Trace: Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>] blk_done_softirq+0x67/0x75 Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>] __do_softirq+0x8d/0x13b Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>] call_softirq+0x1c/0x278 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>] do_softirq+0x31/0x98 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5 Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>] monotonic_clock+0x35/0x7b Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>] start_kernel+0x21f/0x224 Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: Call Trace: Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>] blk_done_softirq+0x67/0x75 Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>] __do_softirq+0x8d/0x13b Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>] call_softirq+0x1c/0x278 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>] do_softirq+0x31/0x98 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5 Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>] monotonic_clock+0x35/0x7b Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>] start_kernel+0x21f/0x224 Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: Call Trace: Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>] blk_done_softirq+0x67/0x75 Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>] __do_softirq+0x8d/0x13b Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>] call_softirq+0x1c/0x278 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>] do_softirq+0x31/0x98 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5 Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>] monotonic_clock+0x35/0x7b Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>] start_kernel+0x21f/0x224 Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: Call Trace: Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>] blk_done_softirq+0x67/0x75 Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>] __do_softirq+0x8d/0x13b Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>] call_softirq+0x1c/0x278 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>] do_softirq+0x31/0x98 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5 Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>] monotonic_clock+0x35/0x7b Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>] start_kernel+0x21f/0x224 Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Apr 18 01:10:00 xenmaster kernel: Apr 18 01:10:00 xenmaster kernel: Call Trace: Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324 Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257 Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>] blk_done_softirq+0x67/0x75 Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>] __do_softirq+0x8d/0x13b Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>] call_softirq+0x1c/0x278 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>] do_softirq+0x31/0x98 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5 Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>] monotonic_clock+0x35/0x7b Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8 Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>] start_kernel+0x21f/0x224 Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb
The machine in question is running 64bit Centos 5.3 Xen kernel and is Dom0:
# uname -a Linux xenmaster.dimension-x.local 2.6.18-128.1.6.el5xen #1 SMP Wed Apr 1 09:53:14 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
This problem started about five days ago, when I upgraded from 5.2 to 5.3. I'm not sure if that was the cause or just a coincidence.
Any suggestions?
-Gordon