[CentOS] EXT3 fs error on RAID1 device

On 3/29/07, Centos-admin <redhat at mckerrs.net> wrote:
> Rasmus Back wrote:
> > On 3/29/07, Alfred von Campe <alfred at 110.net> wrote:
> >> On Mar 29, 2007, at 6:36, Rasmus Back wrote:
> >>
> >> > I have a Dell SC440 running Centos 4.4. It has two 500GB disks in a
> >> > RAID1 array using linux software raid (md1 is / and md0 is /boot).
> >> > Recently the root file system was remounted read-only for some reason.
> >> > The logs don't show anything unusual, presumably the file system was
> >> > read-only before anythng was logged. Running dmesg showed this error
> >> > repeated many times:
> >> >
> >> > EXT3-fs error (device md1) in start_transaction: Journal has aborted
> >>
> >> I had the exact error 9 months or so ago (look for a similarly titled
> >> thread in the archives).  It was a disk going bad.  Get all the data
> >> off you need now and replace the disk ASAP.  It may run for a few
> >> days/weeks before it gets mounted again read only, but eventually you
> >> will lose some data.
> >
> > Hi Alfred.
> >
> > Thanks for the pointer! The smart logs for my drives don't show any
> > errors but I'll start a long selftest just to be sure. Although if it
> > is a failing hard drive then the raid driver should kick it out of the
> > array. Your system was a laptop with just one drive, right?
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > http://lists.centos.org/mailman/listinfo/centos
> >
> There is a know bug with the mpt scsi driver which causes exactly that
> behaviour.  We got bitten by it running vmware ESX virtual machines with
> centos 4.4 and rhel 4.4 in them. Esx uses the mpt driver by default,
> even  if your box does not use the raid, then as far as my understanding
> goes, you could still get the error. It is explained in the links below.
>
>
>
>
> Here are some useful links;
>
> http://www.tuxyturvy.com/blog/index.php?/archives/31-VMware-ESX-and-ext3-journal-aborts.html
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=197158
>
> http://www.vmware.com/community/thread.jspa?threadID=58121
>
>
>
> I have 10 or so real heavy use RHEL 4.4 boxes and at least one box would
> do this at least once a week. I applied the patch and have not seen the
> problem again.
>

Hi Brian,

Thanks a million for the links, my system does use the mpt driver (at
least according to lspci and lsmod). This would at least give an
explanation for the failure. Do you know if the problem is fixed in
RHEL 5? The redhat bugzilla said that something has been changed in
the mpt drive in 2.6.14, but wasn't clear on if those changes solved
the problem. I might upgrade to Centos 5 when it's available anyway.

Rasmus