[CentOS] [centos] libata-core kernel errors

Sat Apr 18 11:19:09 UTC 2009
Gordon McLellan <gordonthree at gmail.com>

This is a repost of sorts, and for that I am sorry; I do not think my
original posting subject was very clear, and I have more data about
the problem.

I'm experiencing lots of kernel errors when reading or writing to a
disk that is part of an mdadm softraid-5 array. Since originally
detecting this problem, I have isolated it to one disk, but I'm not
sure what the cause of the error is.  I have torn down the array and
have been testing the actual device now.

I tested the drive several times using smartctl and the tests came
back OK / PASSED each time, and no kernel errors were logged.

I tested the drive with the "badblocks" command, and there were no
problems reported, and no kernel errors were logged.

As soon as I try to write any substantial data to the device, kernel
errors start pouring in:

# dd if=/dev/zero of=/dev/sdg oflag=sync bs=8M count=100

The dd command does its thing, and eventually finishes without error,
but at a very slow speed, most of my drives run this command at
65-75mb/sec, and sdg completes it at 13mb/sec.

Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel:  <IRQ>  [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80212880>] __do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025fda4>] call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026d08e>] do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel:  <EOI>  [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80634b09>] start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel:  <IRQ>  [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80212880>] __do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025fda4>] call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026d08e>] do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel:  <EOI>  [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80634b09>] start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel:  <IRQ>  [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80212880>] __do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025fda4>] call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026d08e>] do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel:  <EOI>  [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80634b09>] start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel:  <IRQ>  [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80212880>] __do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025fda4>] call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026d08e>] do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel:  <EOI>  [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80634b09>] start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel:  <IRQ>  [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80212880>] __do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025fda4>] call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026d08e>] do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel:  <EOI>  [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff80634b09>] start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel:  [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb


The machine in question is running 64bit Centos 5.3 Xen kernel and is Dom0:

# uname -a
Linux xenmaster.dimension-x.local 2.6.18-128.1.6.el5xen #1 SMP Wed Apr
1 09:53:14 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

This problem started about five days ago, when I upgraded from 5.2 to
5.3.  I'm not sure if that was the cause or just a coincidence.

Any suggestions?

-Gordon