[CentOS] ext3 errors

Mon Feb 25 20:04:03 UTC 2008
Les Mikesell <lesmikesell at gmail.com>

I recently set up a new system to run backuppc on centOS 5 with the 
archive stored on a raid1 of 750 gig SATA drives created with 3 members 
with one specified as "missing".  Once a week I add the 3rd partition, 
let it sync, then remove it.  I've had a similar system working for a 
long time using a firewire drive as the 3rd member, so I don't think the 
raid setup is the cause of the problem.  I may have had problems with 
the drive power connectors initially but I think that is fixed now and I 
can't see any hardware errors being logged (the system/log files are on 
different drives).

About once a week, I get an error like this, and the partition switches 
to read-only.

---
Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): 
htree_dirblock_to_tree: bad entry in directory #869973: directory entry 
across bloc
ks - offset=0, inode=3915132787, rec_len=42464, name_len=11
Feb 24 04:48:20 linbackup1 kernel: Aborting journal on device md3.
Feb 24 04:48:20 linbackup1 kernel: ext3_abort called.
Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): 
ext3_journal_start_sb: Detected aborted journal
Feb 24 04:48:20 linbackup1 kernel: Remounting filesystem read-only
Feb 24 04:48:33 linbackup1 kernel: EXT3-fs error (device md3): 
htree_dirblock_to_tree: bad entry in directory #4212181: rec_len % 4 != 
0 - offse
t=0, inode=4054525677, rec_len=1183, name_len=121
----

'fsck -y' seems to fix it up, but it keeps happening.  Is this likely to 
be leftover cruft from the hardware issues or are there problems in 
ext3/raid1/sata drivers?  The way backuppc stores data with millions of 
hardlinks in the archive it isn't really practical to copy it off, 
reformat, and start over.

-- 
   Les Mikesell
     lesmikesell at gmail.com