[CentOS] Ext3 and drbd read-only remount problem.

Rafał Radecki

radecki.rafal at gmail.com
Sun May 6 08:01:25 UTC 2012


Hi all.

I have two hosts with drbd:
kmod-drbd83-8.3.8-1.el5.centos
drbd83-8.3.8-1.el5.centos
and kernel (CentOS 5.7):
2.6.18-308.4.1.el5

After a recent upgrade of kernel I have had two sitiuations when my ext3
filesystem on /dev/drbd0 became read-only. I've checked disks with smartctl
-t long, they are ok. There are no messages with disks problems in
/var/log/messages | dmesg. I've made fsck tonight but 3 hours after it has
finished the problem repeated once more (under heavy load).

/var/log/messages:

May  6 06:22:27 srv1a kernel: EXT3-fs error (device drbd0):
htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
% 4 != 0 - offset=73728, inode=1701012818, rec_len=30313, name_len=101
May  6 06:22:27 srv1a kernel: Aborting journal on device drbd0.
May  6 06:22:28 srv1a kernel: journal commit I/O error
May  6 06:22:28 srv1a kernel: ext3_abort called.
May  6 06:22:28 srv1a kernel: journal commit I/O error
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
ext3_journal_start_sb: Detected aborted journal
May  6 06:22:28 srv1a kernel: ext3_abort called.
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
ext3_journal_start_sb: Detected aborted journal
May  6 06:22:28 srv1a kernel: Remounting filesystem read-only
May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
b_committed_data
May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
b_committed_data
May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
b_committed_data
May  6 06:22:28 srv1a kernel: journal commit I/O error
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
% 4 != 0 - offset=106496, inode=1701012818, rec_len=30313, name_len=101
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
% 4 != 0 - offset=204800, inode=1869116005, rec_len=29811, name_len=46

I've found:

https://bugzilla.redhat.com/show_bug.cgi?id=494927

There are some clues that it may be a  kernel problem so I went back to:
2.6.18-274.7.1.el5

At the moment the situation is ok but I've read that the problem happens in
random circumstances.

Any clues what to do?

Best regards,
Rafal.



More information about the CentOS mailing list