I recently set up a new system to run backuppc on centOS 5 with the
archive stored on a raid1 of 750 gig SATA drives created with 3 members
with one specified as "missing". Once a week I add the 3rd partition,
let it sync, then remove it. I've had a similar system working for a
long time using a firewire drive as the 3rd member, so I don't think the
raid setup is the cause of the problem. I may have had problems with
the drive power connectors initially but I think that is fixed now and I
can't see any hardware errors being logged (the system/log files are on
different drives).
About once a week, I get an error like this, and the partition switches
to read-only.
---
Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3):
htree_dirblock_to_tree: bad entry in directory #869973: directory entry
across bloc
ks - offset=0, inode=3915132787, rec_len=42464, name_len=11
Feb 24 04:48:20 linbackup1 kernel: Aborting journal on device md3.
Feb 24 04:48:20 linbackup1 kernel: ext3_abort called.
Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3):
ext3_journal_start_sb: Detected aborted journal
Feb 24 04:48:20 linbackup1 kernel: Remounting filesystem read-only
Feb 24 04:48:33 linbackup1 kernel: EXT3-fs error (device md3):
htree_dirblock_to_tree: bad entry in directory #4212181: rec_len % 4 !=
0 - offse
t=0, inode=4054525677, rec_len=1183, name_len=121
----
'fsck -y' seems to fix it up, but it keeps happening. Is this likely to
be leftover cruft from the hardware issues or are there problems in
ext3/raid1/sata drivers? The way backuppc stores data with millions of
hardlinks in the archive it isn't really practical to copy it off,
reformat, and start over.
--
Les Mikesell
lesmikesell(a)gmail.com