[CentOS] EXT3 fs error on RAID1 device

Thu Mar 29 12:03:37 UTC 2007
Centos-admin <redhat at mckerrs.net>

Rasmus Back wrote:
> On 3/29/07, Alfred von Campe <alfred at 110.net> wrote:
>> On Mar 29, 2007, at 6:36, Rasmus Back wrote:
>>
>> > I have a Dell SC440 running Centos 4.4. It has two 500GB disks in a
>> > RAID1 array using linux software raid (md1 is / and md0 is /boot).
>> > Recently the root file system was remounted read-only for some reason.
>> > The logs don't show anything unusual, presumably the file system was
>> > read-only before anythng was logged. Running dmesg showed this error
>> > repeated many times:
>> >
>> > EXT3-fs error (device md1) in start_transaction: Journal has aborted
>>
>> I had the exact error 9 months or so ago (look for a similarly titled
>> thread in the archives).  It was a disk going bad.  Get all the data
>> off you need now and replace the disk ASAP.  It may run for a few
>> days/weeks before it gets mounted again read only, but eventually you
>> will lose some data.
>
> Hi Alfred.
>
> Thanks for the pointer! The smart logs for my drives don't show any
> errors but I'll start a long selftest just to be sure. Although if it
> is a failing hard drive then the raid driver should kick it out of the
> array. Your system was a laptop with just one drive, right?
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
There is a know bug with the mpt scsi driver which causes exactly that
behaviour.  We got bitten by it running vmware ESX virtual machines with
centos 4.4 and rhel 4.4 in them. Esx uses the mpt driver by default,
even  if your box does not use the raid, then as far as my understanding
goes, you could still get the error. It is explained in the links below.




Here are some useful links;

http://www.tuxyturvy.com/blog/index.php?/archives/31-VMware-ESX-and-ext3-journal-aborts.html

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=197158

http://www.vmware.com/community/thread.jspa?threadID=58121



I have 10 or so real heavy use RHEL 4.4 boxes and at least one box would
do this at least once a week. I applied the patch and have not seen the
problem again.



Hope this helps.

Brian.


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.