[CentOS] EXT3-fs error (devive dm-0) in start_transaction: Journal has aborted

Wed Aug 2 13:48:26 UTC 2006
Kay Diederichs <kay.diederichs at uni-konstanz.de>

Alfred von Campe wrote:
> It's time to resurrect this thread from way back in June.  The  problem 
> in the subject line has reared its ugly head again, but this  time with 
> a twist that makes it much worse.  A little refresher on  what was 
> happening back then.  Every so often the root file system  would be 
> remounted read-only, with the error in the subject line  appearing over 
> and over again on the console.
> 
> Lately, this has been happening every 10-14 days, and I would have to  
> reboot my system.  Since the root file system was not writable, no  
> error messages were logged in /var/log/messages.  So I configured  
> syslog to write messages to another system as well, and this time I  
> have captured some errors (see below).  BTW, this is a SATA drive.
> 
> What makes it much worse this time, is that the system won't boot!   
> When I try to boot now I get the following error over and over again:
> 
>   ata1: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04
> 
> HELP!  Is there anything I can do to recover this system?
> 
> Alfred
> 
> 
> Here are the first 50 lines from /var/log/messages (including the  first 
> occurrence of the error in the subject line)
> 
> Aug  1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat  0xb7 
> host_stat 0x21
> Aug  1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err  0xb7/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Aug  1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }
> Aug  1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code =  
> 0x8000002
> Aug  1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command
> Aug  1 18:57:04 balboa01 kernel: Additional sense: Scsi parity error
> Aug  1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda,  
> sector 224365
> Aug  1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7
> Aug  1 18:57:04 balboa01 last message repeated 2 times
> Aug  1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat  0xb7 
> host_stat 0x21
> Aug  1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err  0xb7/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Aug  1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }
> Aug  1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code =  
> 0x8000002
> Aug  1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command
> Aug  1 18:57:04 balboa01 kernel: Additional sense: Scsi parity error
> Aug  1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda,  
> sector 233795925
> Aug  1 18:57:04 balboa01 kernel: Buffer I/O error on device dm-0,  
> logical block 29198337
> Aug  1 18:57:04 balboa01 kernel: lost page write due to I/O error on  dm-0
> Aug  1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7
> Aug  1 18:57:04 balboa01 last message repeated 2 times
> Aug  1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat  0xb7 
> host_stat 0x21
> Aug  1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err  0xb7/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Aug  1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }
> Aug  1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code =  
> 0x8000002
> Aug  1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command
> Aug  1 18:57:04 balboa01 kernel: Additional sense: Scsi parity error
> Aug  1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda,  
> sector 224373
> Aug  1 18:57:04 balboa01 kernel: Buffer I/O error on device dm-0,  
> logical block 1893
> Aug  1 18:57:04 balboa01 kernel: lost page write due to I/O error on  dm-0
> Aug  1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7
> Aug  1 18:57:04 balboa01 last message repeated 2 times
> Aug  1 18:57:04 balboa01 kernel: Aborting journal on device dm-0.
> Aug  1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat  0xb7 
> host_stat 0x21
> Aug  1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err  0xb7/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Aug  1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }
> Aug  1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code =  
> 0x8000002
> Aug  1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command
> Aug  1 18:57:04 balboa01 kernel: Additional sense: Scsi parity error
> Aug  1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda,  
> sector 172585309
> Aug  1 18:57:04 balboa01 kernel: Buffer I/O error on device dm-0,  
> logical block 21547010
> Aug  1 18:57:04 balboa01 kernel: lost page write due to I/O error on  dm-0
> Aug  1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7
> Aug  1 18:57:04 balboa01 last message repeated 2 times
> Aug  1 18:57:04 balboa01 kernel: ext3_abort called.
> Aug  1 18:57:04 balboa01 kernel: EXT3-fs error (device dm-0):  
> ext3_journal_start_sb: Detected aborted journal
> Aug  1 18:57:04 balboa01 kernel: Remounting filesystem read-only
> Aug  1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat  0xb7 
> host_stat 0x21
> Aug  1 18:57:04 balboa01 kernel: EXT3-fs error (device dm-0) in  
> start_transaction: Journal has aborted
> Aug  1 18:57:34 balboa01 kernel: ata1: command 0x35 timeout, stat  0xb7 
> host_stat 0x21
> Aug  1 18:57:34 balboa01 kernel: ata1: translated ATA stat/err  0xb7/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos


Maybe the disk is dying? Did you run smartd (it requires -d ata for SATA 
disks; this option needs to be put in smartd.conf)?

The error messages could also indicate bad cables.

I would boot from the CentOS 4.3 Live-CD, and take a look at the disk 
with smartctl. If the disk is indeed dying, I'd try to save its contents 
to a fresh disk, using ddrescue. Unfortunately there are 2 programs with 
this name (http://www.garloff.de/kurt/linux/ddrescue/ and 
http://www.gnu.org/software/ddrescue/ddrescue.html); I have very good 
results with the latter - don't know if it's on the LiveCD (if not, it 
should!).

If the disk shows no SMART errors you could use e2fsck.

HTH,

Kay
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3211 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.centos.org/pipermail/centos/attachments/20060802/6d6e92c7/attachment-0005.bin>