I recently set up a new system to run backuppc on centOS 5 with the archive stored on a raid1 of 750 gig SATA drives created with 3 members with one specified as "missing". Once a week I add the 3rd partition, let it sync, then remove it. I've had a similar system working for a long time using a firewire drive as the 3rd member, so I don't think the raid setup is the cause of the problem. I may have had problems with the drive power connectors initially but I think that is fixed now and I can't see any hardware errors being logged (the system/log files are on different drives).
About once a week, I get an error like this, and the partition switches to read-only.
--- Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): htree_dirblock_to_tree: bad entry in directory #869973: directory entry across bloc ks - offset=0, inode=3915132787, rec_len=42464, name_len=11 Feb 24 04:48:20 linbackup1 kernel: Aborting journal on device md3. Feb 24 04:48:20 linbackup1 kernel: ext3_abort called. Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): ext3_journal_start_sb: Detected aborted journal Feb 24 04:48:20 linbackup1 kernel: Remounting filesystem read-only Feb 24 04:48:33 linbackup1 kernel: EXT3-fs error (device md3): htree_dirblock_to_tree: bad entry in directory #4212181: rec_len % 4 != 0 - offse t=0, inode=4054525677, rec_len=1183, name_len=121 ----
'fsck -y' seems to fix it up, but it keeps happening. Is this likely to be leftover cruft from the hardware issues or are there problems in ext3/raid1/sata drivers? The way backuppc stores data with millions of hardlinks in the archive it isn't really practical to copy it off, reformat, and start over.