For some time I have been seeing disk errors in the syslog every seven days. Until today it always happens Sunday morning at 8:13 AM, plus or minus a minute or two. Yesterday it happened at 1:13 AM. Here are the pertinent log entries for the latest occurrence:
Jan 6 01:12:29 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:12:29 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:12:29 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:12:29 g2 kernel: ata9.00: cmd 25/00:00:00:fe:d5/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:12:29 g2 kernel: res 51/40:00:19:ff:d5/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:12:29 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:12:29 g2 kernel: ata9.00: error: { UNC } Jan 6 01:12:29 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:12:29 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:12:29 g2 kernel: ata9: EH complete Jan 6 01:12:33 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:12:33 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:12:33 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:12:33 g2 kernel: ata9.00: cmd 25/00:00:00:fe:d5/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:12:33 g2 kernel: res 51/40:00:7c:ff:d5/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:12:33 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:12:33 g2 kernel: ata9.00: error: { UNC } Jan 6 01:12:33 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:12:33 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:12:33 g2 kernel: ata9: EH complete Jan 6 01:13:06 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:13:06 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:13:06 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:13:06 g2 kernel: ata9.00: cmd 25/00:00:00:bf:d6/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:13:06 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:13:06 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:13:06 g2 kernel: ata9.00: error: { UNC } Jan 6 01:13:07 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:13:07 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:13:07 g2 kernel: ata9: EH complete Jan 6 01:13:10 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:13:10 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:13:10 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:13:10 g2 kernel: ata9.00: cmd 25/00:00:00:bf:d6/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:13:10 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:13:10 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:13:10 g2 kernel: ata9.00: error: { UNC } Jan 6 01:13:10 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:13:10 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:13:10 g2 kernel: ata9: EH complete Jan 6 01:13:14 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:13:14 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:13:14 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:13:14 g2 kernel: ata9.00: cmd 25/00:00:00:bf:d6/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:13:14 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:13:14 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:13:14 g2 kernel: ata9.00: error: { UNC } Jan 6 01:13:14 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:13:14 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:13:14 g2 kernel: ata9: EH complete Jan 6 01:13:17 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:13:17 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:13:17 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:13:17 g2 kernel: ata9.00: cmd 25/00:00:00:bf:d6/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:13:17 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:13:17 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:13:17 g2 kernel: ata9.00: error: { UNC } Jan 6 01:13:18 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:13:18 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:13:18 g2 kernel: ata9: EH complete Jan 6 01:13:21 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:13:21 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:13:21 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:13:21 g2 kernel: ata9.00: cmd 25/00:00:00:bf:d6/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:13:21 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:13:21 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:13:21 g2 kernel: ata9.00: error: { UNC } Jan 6 01:13:21 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:13:21 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:13:21 g2 kernel: ata9: EH complete Jan 6 01:13:25 g2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 6 01:13:25 g2 kernel: ata9.00: BMDMA stat 0x64 Jan 6 01:13:25 g2 kernel: ata9.00: failed command: READ DMA EXT Jan 6 01:13:25 g2 kernel: ata9.00: cmd 25/00:00:00:bf:d6/00:02:04:00:00/e0 tag 0 dma 262144 in Jan 6 01:13:25 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error) Jan 6 01:13:25 g2 kernel: ata9.00: status: { DRDY ERR } Jan 6 01:13:25 g2 kernel: ata9.00: error: { UNC } Jan 6 01:13:25 g2 kernel: ata9.00: configured for UDMA/33 Jan 6 01:13:25 g2 kernel: ata9.01: configured for UDMA/33 Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Unhandled sense code Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Sense Key : Medium Error [current] [descriptor] Jan 6 01:13:25 g2 kernel: Descriptor sense data with sense descriptors (in hex): Jan 6 01:13:25 g2 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jan 6 01:13:25 g2 kernel: 04 d6 bf db Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] CDB: Read(10): 28 00 04 d6 bf 00 00 02 00 00 Jan 6 01:13:25 g2 kernel: ata9: EH complete
There is nothing in /etc/cron.weekly, nor are there any root crontab entries. Any suggestions for investigating this issue would be much appreciated.
Emmett
On Mon, Jan 7, 2013 at 5:58 PM, Emmett Culley emmett@webengineer.com wrote:
For some time I have been seeing disk errors in the syslog every seven days. Until today it always happens Sunday morning at 8:13 AM, plus or minus a minute or two. Yesterday it happened at 1:13 AM. Here are the pertinent log entries for the latest occurrence:
[...]
Jan 6 01:13:25 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error)
[...]
Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed
[...]
There is nothing in /etc/cron.weekly, nor are there any root crontab entries. Any suggestions for investigating this issue would be much appreciated.
Emmett
Based on this I'd say your disk is going bad, and has run out of spare sectors: Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed
You can use smartctl to get some information from the SMART tables, but I've never been able to get a conclusive test out of the testing options. It would be a good idea to run 'badblocks' against the drive as well, as it will definitely tell you if there are bad sectors.
Disks are so cheap it's usually not worth too much effort or delay once you've found out that it's bad.
❧ Brian Mathis
On 01/07/2013 06:24 PM, Brian Mathis wrote:
On Mon, Jan 7, 2013 at 5:58 PM, Emmett Culleyemmett@webengineer.com wrote:
For some time I have been seeing disk errors in the syslog every seven days. Until today it always happens Sunday morning at 8:13 AM, plus or minus a minute or two. Yesterday it happened at 1:13 AM. Here are the pertinent log entries for the latest occurrence:
[...]
Jan 6 01:13:25 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error)
[...]
Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed
[...]
There is nothing in /etc/cron.weekly, nor are there any root crontab entries. Any suggestions for investigating this issue would be much appreciated.
Emmett
Based on this I'd say your disk is going bad, and has run out of spare sectors: Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed
You can use smartctl to get some information from the SMART tables, but I've never been able to get a conclusive test out of the testing options. It would be a good idea to run 'badblocks' against the drive as well, as it will definitely tell you if there are bad sectors.
Disks are so cheap it's usually not worth too much effort or delay once you've found out that it's bad.
❧ Brian Mathis
How do you explain the regular timing of the errors? Is there a process, maybe a backup or something, that runs at this time every Sunday morning Mr. Mathis?
On 1/7/2013 3:43 PM, Mark LaPierre wrote:
How do you explain the regular timing of the errors? Is there a process, maybe a backup or something, that runs at this time every Sunday morning Mr. Mathis?
is this disk part of an mdraid mirror by any chance? /etc/cron.weekly/99raid-check does a synccheck of each md metadevice.
On 01/07/2013 04:19 PM, John R Pierce wrote:
On 1/7/2013 3:43 PM, Mark LaPierre wrote:
How do you explain the regular timing of the errors? Is there a process, maybe a backup or something, that runs at this time every Sunday morning Mr. Mathis?
is this disk part of an mdraid mirror by any chance? /etc/cron.weekly/99raid-check does a synccheck of each md metadevice.
It is a raid drive and there a "read-check" file in /etc/cron.d. It is set to run raid-check at 1:00 AM on Sunday. So I ran raid-check on the commend line, and sure enough the disk errors show up in syslog.
I'll replace the drive, then run some long tests on it out of the system.
Thanks for the suggestions!
Emmett
On 01/07/2013 03:43 PM, Mark LaPierre wrote:
On 01/07/2013 06:24 PM, Brian Mathis wrote:
On Mon, Jan 7, 2013 at 5:58 PM, Emmett Culleyemmett@webengineer.com wrote:
For some time I have been seeing disk errors in the syslog every seven days. Until today it always happens Sunday morning at 8:13 AM, plus or minus a minute or two. Yesterday it happened at 1:13 AM. Here are the pertinent log entries for the latest occurrence:
[...]
Jan 6 01:13:25 g2 kernel: res 51/40:00:db:bf:d6/40:00:04:00:00/00 Emask 0x9 (media error)
[...]
Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed
[...]
There is nothing in /etc/cron.weekly, nor are there any root crontab entries. Any suggestions for investigating this issue would be much appreciated.
Emmett
Based on this I'd say your disk is going bad, and has run out of spare sectors: Jan 6 01:13:25 g2 kernel: sd 8:0:0:0: [sdg] Add. Sense: Unrecovered read error - auto reallocate failed
You can use smartctl to get some information from the SMART tables, but I've never been able to get a conclusive test out of the testing options. It would be a good idea to run 'badblocks' against the drive as well, as it will definitely tell you if there are bad sectors.
Disks are so cheap it's usually not worth too much effort or delay once you've found out that it's bad.
❧ Brian Mathis
How do you explain the regular timing of the errors? Is there a process, maybe a backup or something, that runs at this time every Sunday morning Mr. Mathis?
I Just looked a the backup process and noticed that an incremental backup started at 1:00 AM. However none of the other backups listed for this machine correlate in any way to the times that the disk errors re reported.
As this is a host for multiple VMs it might be a good idea to look on each VM for cron jobs running at the time of the disk errors. I'll look there next.
The drive the error reports concern is mounted via mdadm as /boot, so I was able to unmount it, stop the raid and run bad blocks via e2fsck. That reports:
Checking for bad blocks (read-only test): done /dev/sdg1: Updating bad block inode. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information
/dev/sdg1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sdg1: 67/128016 files (7.5% non-contiguous), 165468/511988 blocks
So I"ll wait until to see it it happens next Sunday.
Emmett
Emmett