Hi all.
I have two hosts with drbd: kmod-drbd83-8.3.8-1.el5.centos drbd83-8.3.8-1.el5.centos and kernel (CentOS 5.7): 2.6.18-308.4.1.el5
After a recent upgrade of kernel I have had two sitiuations when my ext3 filesystem on /dev/drbd0 became read-only. I've checked disks with smartctl -t long, they are ok. There are no messages with disks problems in /var/log/messages | dmesg. I've made fsck tonight but 3 hours after it has finished the problem repeated once more (under heavy load).
/var/log/messages:
May 6 06:22:27 srv1a kernel: EXT3-fs error (device drbd0): htree_dirblock_to_tree: bad entry in directory #43024813: rec_len % 4 != 0 - offset=73728, inode=1701012818, rec_len=30313, name_len=101 May 6 06:22:27 srv1a kernel: Aborting journal on device drbd0. May 6 06:22:28 srv1a kernel: journal commit I/O error May 6 06:22:28 srv1a kernel: ext3_abort called. May 6 06:22:28 srv1a kernel: journal commit I/O error May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): ext3_journal_start_sb: Detected aborted journal May 6 06:22:28 srv1a kernel: ext3_abort called. May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): ext3_journal_start_sb: Detected aborted journal May 6 06:22:28 srv1a kernel: Remounting filesystem read-only May 6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing b_committed_data May 6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing b_committed_data May 6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing b_committed_data May 6 06:22:28 srv1a kernel: journal commit I/O error May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): htree_dirblock_to_tree: bad entry in directory #43024813: rec_len % 4 != 0 - offset=106496, inode=1701012818, rec_len=30313, name_len=101 May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): htree_dirblock_to_tree: bad entry in directory #43024813: rec_len % 4 != 0 - offset=204800, inode=1869116005, rec_len=29811, name_len=46
I've found:
https://bugzilla.redhat.com/show_bug.cgi?id=494927
There are some clues that it may be a kernel problem so I went back to: 2.6.18-274.7.1.el5
At the moment the situation is ok but I've read that the problem happens in random circumstances.
Any clues what to do?
Best regards, Rafal.
I have one more question with regard to mentioned kernel update to 2.6.18-308.4.1.el5 : in extras repo there is a package available
kmod-drbd83 8.3.12 This package provides the drbd83 kernel modules built for the Linux : kernel 2.6.18-274.17.1.el5 for the i686 family of processors.
We currently have installed kmod-drbd83:
8.3.8 This package provides the drbd83 kernel modules built for the Linux : kernel 2.6.18-194.el5 for the i686 family of processors.
Should kmod-drbd83 version match current kernel version (from package description) or should kmod-drbd83 in version 8.3.8 be installed if we are using drbd83-8.3.8-1.el5.centos ?
Best regards, Rafal.
2012/5/6 Rafał Radecki radecki.rafal@gmail.com
Hi all.
I have two hosts with drbd: kmod-drbd83-8.3.8-1.el5.centos drbd83-8.3.8-1.el5.centos and kernel (CentOS 5.7): 2.6.18-308.4.1.el5
After a recent upgrade of kernel I have had two sitiuations when my ext3 filesystem on /dev/drbd0 became read-only. I've checked disks with smartctl -t long, they are ok. There are no messages with disks problems in /var/log/messages | dmesg. I've made fsck tonight but 3 hours after it has finished the problem repeated once more (under heavy load).
/var/log/messages:
May 6 06:22:27 srv1a kernel: EXT3-fs error (device drbd0): htree_dirblock_to_tree: bad entry in directory #43024813: rec_len % 4 != 0 - offset=73728, inode=1701012818, rec_len=30313, name_len=101 May 6 06:22:27 srv1a kernel: Aborting journal on device drbd0. May 6 06:22:28 srv1a kernel: journal commit I/O error May 6 06:22:28 srv1a kernel: ext3_abort called. May 6 06:22:28 srv1a kernel: journal commit I/O error May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): ext3_journal_start_sb: Detected aborted journal May 6 06:22:28 srv1a kernel: ext3_abort called. May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): ext3_journal_start_sb: Detected aborted journal May 6 06:22:28 srv1a kernel: Remounting filesystem read-only May 6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing b_committed_data May 6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing b_committed_data May 6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing b_committed_data May 6 06:22:28 srv1a kernel: journal commit I/O error May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): htree_dirblock_to_tree: bad entry in directory #43024813: rec_len % 4 != 0 - offset=106496, inode=1701012818, rec_len=30313, name_len=101 May 6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0): htree_dirblock_to_tree: bad entry in directory #43024813: rec_len % 4 != 0 - offset=204800, inode=1869116005, rec_len=29811, name_len=46
I've found:
https://bugzilla.redhat.com/show_bug.cgi?id=494927
There are some clues that it may be a kernel problem so I went back to: 2.6.18-274.7.1.el5
At the moment the situation is ok but I've read that the problem happens in random circumstances.
Any clues what to do?
Best regards, Rafal.