[CentOS] problem with 5.3 upgrade or just bad timing?

Fri Apr 17 18:23:52 UTC 2009
Gordon McLellan <gordonthree at gmail.com>

I've been experiencing delays access data off my file server since I
upgraded to 5.3... either I hosed something, have bad hardware or very
unlikely, found a bug.

When reading or writing data, the stream to the hdd's stops every 5-10
min and %iowait goes through the roof.  I checked the logs and they
are filled with this diagnostic data that I can't readily decipher.

my setup involves two md raid5 devices, 9+1 sata drives total, five of
them are on the internal intel sata (i5000) controller, five of them
on an external super micro (marvell) controller.  I need to do some
more testing to see which of (or is it both?) of the md devices is
generating these errors.

system is running centos 5.3 64bit:
# uname -a
Linux xenmaster.dimension-x.local 2.6.18-128.1.6.el5xen #1 SMP Wed Apr
1 09:53:14 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdk1[2] sdj1[4] sdi1[3] sdh1[0] sdg1[1]
      976783616 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md0 : active raid5 sdf1[3] sde1[1] sdd1[4](S) sdc1[0] sdb1[2]
      2197715712 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

here is what dmesg has to say: (some of it looks like repeats but some
of it is very slightly different too)

 BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )

Call Trace:
 [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9
 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188
 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4
 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322
 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c
 [<ffffffff8025c62a>] generic_unplug_device+0x22/0x37
 [<ffffffff8835c060>] :raid456:unplug_slaves+0x60/0x9b
 [<ffffffff8836122f>] :raid456:raid5d+0x17c/0x187
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff803fed01>] md_thread+0xf8/0x10e
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff803fec09>] md_thread+0x0/0x10e
 [<ffffffff80233483>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233385>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )

Call Trace:
 [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9
 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188
 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4
 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322
 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c
 [<ffffffff8025c62a>] generic_unplug_device+0x22/0x37
 [<ffffffff8835c060>] :raid456:unplug_slaves+0x60/0x9b
 [<ffffffff8836122f>] :raid456:raid5d+0x17c/0x187
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff803fed01>] md_thread+0xf8/0x10e
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff803fec09>] md_thread+0x0/0x10e
 [<ffffffff80233483>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233385>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )

Call Trace:
 [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9
 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188
 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff8033949e>] cfq_kick_queue+0x0/0x89
 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4
 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322
 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c
 [<ffffffff80339514>] cfq_kick_queue+0x76/0x89
 [<ffffffff8024ee1e>] run_workqueue+0x94/0xe4
 [<ffffffff8024b727>] worker_thread+0x0/0x122
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8024b817>] worker_thread+0xf0/0x122
 [<ffffffff80286daf>] default_wake_function+0x0/0xe
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233483>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8026df02>] monotonic_clock+0x35/0x7b
 [<ffffffff80233385>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )

Call Trace:
 [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9
 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188
 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4
 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322
 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c
 [<ffffffff8025c62a>] generic_unplug_device+0x22/0x37
 [<ffffffff8835c060>] :raid456:unplug_slaves+0x60/0x9b
 [<ffffffff8836122f>] :raid456:raid5d+0x17c/0x187
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff803fed01>] md_thread+0xf8/0x10e
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff803fec09>] md_thread+0x0/0x10e
 [<ffffffff80233483>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233385>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G     )

Call Trace:
 <IRQ>  [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9
 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188
 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d
 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18
 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4
 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322
 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c
 [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72
 [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39
 [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd
 [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324
 [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257
 [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81
 [<ffffffff802389b8>] blk_done_softirq+0x67/0x75
 [<ffffffff80212880>] __do_softirq+0x8d/0x13b
 [<ffffffff8025fda4>] call_softirq+0x1c/0x278
 [<ffffffff8026d08e>] do_softirq+0x31/0x98
 [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
 [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb
 [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff8026df02>] monotonic_clock+0x35/0x7b
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ba22>] xen_idle+0x38/0x4a
 [<ffffffff8024a803>] cpu_idle+0x97/0xba

var/log/messages has the same info, with timestamps - this error
repeats itself several times a second for several minutes at a time

all suggestions are appreciated!
-Gordon