I'm getting this message in my logwatch email notification:
--------------------- Kernel Begin ------------------------
WARNING: Kernel Errors Present sdb:<3>Buffer I/O error on device sdb, l ...: 2 Time(s) Buffer I/O error on device sdb, l ...: 12 Time(s) hde: dma_intr: error=0x84 { DriveStat ...: 12 Time(s) hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } ...: 12 Time(s)
---------------------- Kernel End -------------------------
It's an old drive I'm using for swap space, /var, and /tmp. (It's on a PCI IDE controller, that's why it comes up as hde.)
If I test it for bad sectors using Vivard, there are no bad sectors found or remapped.
I'm just trying to move a lot of regular disk I/O from my main drive with the root installtion on it, to a replaceable spare.
I cannot find which log file these messages are going to.
Nothing in /var/log/dmesg or messages.
Where does logwatch get these messages from?
Kind Regards,
Keith Roberts
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
On Wed, Jan 12, 2011 at 4:45 AM, Keith Roberts keith@karsites.net wrote:
It's an old drive I'm using for swap space, /var, and /tmp. (It's on a PCI IDE controller, that's why it comes up as hde.)
If I test it for bad sectors using Vivard, there are no bad sectors found or remapped.
I'm just trying to move a lot of regular disk I/O from my main drive with the root installtion on it, to a replaceable spare.
I cannot find which log file these messages are going to.
Nothing in /var/log/dmesg or messages.
Where does logwatch get these messages from?
Take a look at your /etc/syslog.conf and /etc/sysconfig/syslog and see where the kernel messages are being logged.
There's a klogd service that logs these particular messages.. it's started from the same runscript as syslog (/etc/rc.d/init.d/syslog).
Bad sectors get reallocated automatically, so you might not find any with testing. You need to see how many have been reallocated.
SMART should already be enabled, so maximize your term window and type:
smartctl -a /dev/sdb
That will show the reallocated sector count, as well as power on hours, and temps, etc. Do that for each drive.
If its attached to a raid controller, you have to take additional steps as found on google.
If there are any reallocated sectors, you might want to think about replacing it. I have a customer with a failing drive in a server that causes it to freeze from time to time as it develops new bad sectors. I'm replacing it this weekend...
On Wed, 12 Jan 2011, compdoc wrote:
To: 'CentOS mailing list' centos@centos.org From: compdoc compdoc@hotrodpc.com Subject: Re: [CentOS] Kernel Errors Present
Bad sectors get reallocated automatically, so you might not find any with testing. You need to see how many have been reallocated.
Vivard disk diagnostic tool lists any sector read erros, and a count of remapped sectors, if there are any remapped.
SMART should already be enabled, so maximize your term window and type:
smartctl -a /dev/sdb
That will show the reallocated sector count, as well as power on hours, and temps, etc. Do that for each drive.
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 093 090 021 Pre-fail Always - 2741
4 Start_Stop_Count 0x0032 099 099 040 Old_age Always - 1611
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
No re-allocated sectors found.
If its attached to a raid controller, you have to take additional steps as found on google.
No it's a standard IDE controller.
Looking in /proc/ide/ide2/hde/settings I find this:
name value min max mode ---- ----- --- --- ----
pio_mode write-only 0 255 w
using_dma 1 0 1 rw
wcache 1 0 1 rw
I have tried to turn DMA off for this drive, using the libata.dma=0 kernel boot parameter.
But it's still coming up as using_dma 1.
If I can turn DMA off for this drive, that might get rid of the DMA error messages.
hde: dma_intr: error=0x84 { DriveStat ...: 12 Time(s) hde: dma_intr: status=0x51 { DriveReady SeekComplete
Kind Regards,
Keith Roberts
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
Hi Keith (2011/01/13 6:39), Keith Roberts wrote:
hde: dma_intr: error=0x84 { DriveStat ...: 12 Time(s) hde: dma_intr: status=0x51 { DriveReady SeekComplete
The first error is data transmitting error. Your HARD DRIVE have a data transmitting error or malfunction on transmitting path without disk. (The trouble is on memory, chip set, IDE-cable, HDD-Circuit(DMA). HDD dish is OK.) DMA I/O was designed with 2 separated unit (control-unit and data-unit) The trouble is on control-unit part.
Vivard/smartctl only explains your data-unit is OK.
dma_intr: error=0x84 { DriveStatusError BadCRC } http://www.mail-archive.com/debian-user@lists.debian.org/msg128610.html
-Tsuyoshi.
On Thu, 13 Jan 2011, Tsuyoshi Nagata wrote:
To: CentOS mailing list centos@centos.org From: Tsuyoshi Nagata nagata3333333@jp.fujitsu.com Subject: Re: [CentOS] Kernel Errors Present
Hi Keith (2011/01/13 6:39), Keith Roberts wrote:
hde: dma_intr: error=0x84 { DriveStat ...: 12 Time(s) hde: dma_intr: status=0x51 { DriveReady SeekComplete
The first error is data transmitting error. Your HARD DRIVE have a data transmitting error or malfunction on transmitting path without disk. (The trouble is on memory, chip set, IDE-cable, HDD-Circuit(DMA). HDD dish is OK.) DMA I/O was designed with 2 separated unit (control-unit and data-unit) The trouble is on control-unit part.
Vivard/smartctl only explains your data-unit is OK.
dma_intr: error=0x84 { DriveStatusError BadCRC } http://www.mail-archive.com/debian-user@lists.debian.org/msg128610.html
Thanks for all that information Tsuyoshi.
I have turned off dma for this drive with:
[root@karsites hde]# hdparm -d0 /dev/hde
/dev/hde: setting using_dma to 0 (off) using_dma = 0 (off)
[root@karsites hde]# hdparm -d /dev/hde
/dev/hde: using_dma = 0 (off)
I'll watch and see how things go now.
Kind Regards,
Keith
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
On Thu, 13 Jan 2011, Tsuyoshi Nagata wrote:
To: CentOS mailing list centos@centos.org From: Tsuyoshi Nagata nagata3333333@jp.fujitsu.com Subject: Re: [CentOS] Kernel Errors Present
Hi Keith (2011/01/13 6:39), Keith Roberts wrote:
hde: dma_intr: error=0x84 { DriveStat ...: 12 Time(s) hde: dma_intr: status=0x51 { DriveReady SeekComplete
The first error is data transmitting error. Your HARD DRIVE have a data transmitting error or malfunction on transmitting path without disk. (The trouble is on memory, chip set, IDE-cable, HDD-Circuit(DMA). HDD dish is OK.) DMA I/O was designed with 2 separated unit (control-unit and data-unit) The trouble is on control-unit part.
Vivard/smartctl only explains your data-unit is OK.
Well it seems likely it's because the drive is on a 40-wire cable. But the kernel wants to do UDMA at 100 MB/s.
The ITE8212 PCI controller card spots the 40 wire cable, and sets the transfer mode down from UDMA5 (100 MB/s) to UDMA2, 33MB/s
I have found this in /var/log/messages:
*snip* Jan 13 18:35:16 karsites kernel: hde: 78165360 sectors (40020 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) Jan 13 18:35:16 karsites kernel: hde: cache flushes not supported Jan 13 18:35:16 karsites kernel: hde: hde1 hde2 < hde5 hde6 hde7 > *snip* Jan 13 18:35:16 karsites kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 13 18:35:16 karsites kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 13 18:35:16 karsites kernel: ide: failed opcode was: unknown Jan 13 18:35:16 karsites kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 13 18:35:16 karsites kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 13 18:35:16 karsites kernel: ide: failed opcode was: unknown Jan 13 18:35:16 karsites kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 13 18:35:16 karsites kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 13 18:35:16 karsites kernel: ide: failed opcode was: unknown Jan 13 18:35:16 karsites kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 13 18:35:16 karsites kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 13 18:35:16 karsites kernel: ide: failed opcode was: unknown Jan 13 18:35:16 karsites kernel: ide2: reset: success *snip*
Is the kernel probing the drive directly, and using the drive maximum UDMA rate, instead of getting this from the ITE8212 PCI card?
I have been reading up about hdparm, and set the drives UDMA mode in /etc/init.d/rc.local with;
[root@karsites ~]# hdparm -d1 -Xudma2 /dev/hde
/dev/hde: setting using_dma to 1 (on) setting xfermode to 66 (UltraDMA mode2) using_dma = 1 (on)
Is that why the ide2 reset was successfull?
I shall monitor this and see if I get those errors again.
Also, using hdparm from the command line, allows me to test the data transfer rates, with or without DMA enabled.
Looks good, and I guess I will find some 80 conductor IDE cables for all my IDE drives, and enable UDMA to get the maximum transfer rate.
40 wire IDE cables are not worth the hassle any more, now UDMA is so stable.
Kind Regards,
Keith Roberts
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
Hi Keith,
On Thu, 2011-01-13 at 19:03 +0000, Keith Roberts wrote:
Well it seems likely it's because the drive is on a 40-wire cable. But the kernel wants to do UDMA at 100 MB/s.
See hdparm's -X switch to override the (U)DMA mode used for the drive.
Regards, Leonard.
On Sat, 15 Jan 2011, Leonard den Ottolander wrote:
To: CentOS mailing list centos@centos.org From: Leonard den Ottolander leonard@den.ottolander.nl Subject: Re: [CentOS] Kernel Errors Present
Hi Keith,
On Thu, 2011-01-13 at 19:03 +0000, Keith Roberts wrote:
Well it seems likely it's because the drive is on a 40-wire cable. But the kernel wants to do UDMA at 100 MB/s.
See hdparm's -X switch to override the (U)DMA mode used for the drive.
Hi Leonard.
Yes, I've added the following into /etc/rc.d/rc.local:
#!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff.
touch /var/lock/subsys/local
# turn off DMA for hde WD drive # -d0 = off # -d1 = on # hdparm -d0 /dev/hde
# hdparm -d1 -Xudma2 /dev/hde # # /dev/hde: # setting using_dma to 1 (on) # setting xfermode to 66 (UltraDMA mode2) # using_dma = 1 (on) # # set WD drive to use UDMA2 - 33 MB/s hdparm -d1 -Xudma2 /dev/hde
# set sector count for multiple sector I/O # WD drives like a low setting # to prevent I/O data errors. hdparm -m2 /dev/hde
# enable 32-bit data transfers with a special sync sequence # required by many chipsets # /dev/hde: # setting 32-bit IO_support flag to 3 # IO_support = 3 (32-bit w/sync) hdparm -c3 /dev/hde
sleep 10
# end of rc.local
At reset/power on time the IT8212 controller spotted the WD drive was on a 40 wire cable, and set the UDMA transfer rate to UDMA 2 (33 MB/s).
However for some reason the kernel decided to set the transfer rate for the drive to UDMA 5 (100 MB/s).
There were thousands of CRC errors in the SMART data for this drive, which would indicate crosstalk problems on the 40 wire cable being run at too high a speed.
Now I'm using hdparm to reset the drive to UDMA2 (33 MB/s) there are no more dma_intr errors occuring, or being reported by logwatch.
Thanks for all the feedback on this.
The drive is now working as desired.
I hope to be be getting some custom made 80 wire UDMA IDE cables sorted ASAP. That should squeeze extra speed from all the drives on the machine.
The 40 wire IDE cables will be packed away safely!
Kind Regards,
Keith Roberts
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
On Sat, Jan 15, 2011 at 7:57 AM, Keith Roberts keith@karsites.net wrote:
I hope to be be getting some custom made 80 wire UDMA IDE cables sorted ASAP. That should squeeze extra speed from all the drives on the machine.
You shouldn't need custom cables. IDE 80 pin cables can be sourced all over the Internet for around $5 a cable. Prices have gone up since they are now not common. You might even post a wanted ad on craigslist and see if you can get a handful for a few bucks.
Ryan
Same Problem here, the harddrive (2.5" Samsung HM121HC) running with Kernel 2.6.18-194.32.1.el5 (x86_64) produces errors on high load. With one step back kernel the errors are gone. Im already changed the harddrive with a new one, same errors on the newest kernel.
dmesg output:
hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hdc: drive not ready for command attempt to access beyond end of device hdc3: rw=0, want=25863980832, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=7830939224, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=31645262224, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=25863980832, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=25863980832, limit=225841770 hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hdc: drive not ready for command
There are no errors logged in smart, i already try'd with smartctl -t long .... no errors. I also did a blocktest on this drive. Next step is to change the cables, but i don't think this would be a solution, i think it's a kernel IDE / DMA problem.
Wolfgang
Am 15.01.11 14:50, schrieb Ryan Wagoner:
On Sat, Jan 15, 2011 at 7:57 AM, Keith Roberts keith@karsites.net wrote:
I hope to be be getting some custom made 80 wire UDMA IDE cables sorted ASAP. That should squeeze extra speed from all the drives on the machine.
You shouldn't need custom cables. IDE 80 pin cables can be sourced all over the Internet for around $5 a cable. Prices have gone up since they are now not common. You might even post a wanted ad on craigslist and see if you can get a handful for a few bucks.
Ryan _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, 17 Jan 2011, Shade.GE wrote:
To: CentOS mailing list centos@centos.org From: Shade.GE shade.ge@gmail.com Subject: Re: [CentOS] Kernel Errors Present
Same Problem here, the harddrive (2.5" Samsung HM121HC) running with Kernel 2.6.18-194.32.1.el5 (x86_64) produces errors on high load. With one step back kernel the errors are gone. Im already changed the harddrive with a new one, same errors on the newest kernel.
dmesg output:
hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hdc: drive not ready for command attempt to access beyond end of device hdc3: rw=0, want=25863980832, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=7830939224, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=31645262224, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=25863980832, limit=225841770 attempt to access beyond end of device hdc3: rw=0, want=25863980832, limit=225841770 hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hdc: drive not ready for command
There are no errors logged in smart, i already try'd with smartctl -t long .... no errors. I also did a blocktest on this drive. Next step is to change the cables, but i don't think this would be a solution, i think it's a kernel IDE / DMA problem.
Check your BIOS settings are corect, and have you enabled LBA for this drive?
You might need to enter the C/H/S values by hand, if these are not being detected properly. Is the drive jumpered properly?
I have got rid of the errors now from my WD 40GB drive, on the latest 32 bit kernel.
Also look in /var/log/messages to see how the kernel initialises the drive.
As I mentioned in an earlier post, I now use hdparm from the rc.local script to reset my drive to UDMA 2. Please check the posts I made last week regarding this.
Please also read the man page for hdparm. You can use that to get alot of information about your drive, and it's current (U)DMA settings.
EG:
[root@karsites ~]# hdparm /dev/hde
/dev/hde: multcount = 2 (on) IO_support = 3 (32-bit w/sync) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 65535/16/63, sectors = 78165360, start = 0 [root@karsites ~]# [root@karsites ~]# hdparm -I /dev/hde
/dev/hde:
ATA device, with non-removable media Model Number: WDC WD400BB-00GFA0 Serial Number: WD-WMAKA1241735 Firmware Revision: 09.01B09 Standards: Supported: 5 4 3 Likely used: 6 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 78165360 device size with M = 1024*1024: 38166 MBytes device size with M = 1000*1000: 40020 MBytes (40 GB) Capabilities: LBA, IORDY(can be disabled) bytes avail on r/w long: 40 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 2 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE SET_MAX security extension * Automatic Acoustic Management feature set * Device Configuration Overlay feature set * SMART error logging * SMART self-test Security: supported not enabled not locked not frozen not expired: security count not supported: enhanced erase HW reset results: CBLID- below Vih Device num = 0 determined by the jumper Checksum: correct [root@karsites ~]#
HTH
Keith Roberts
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
On Sat, 15 Jan 2011, Ryan Wagoner wrote:
To: CentOS mailing list centos@centos.org From: Ryan Wagoner rswagoner@gmail.com Subject: Re: [CentOS] Kernel Errors Present
On Sat, Jan 15, 2011 at 7:57 AM, Keith Roberts keith@karsites.net wrote:
I hope to be be getting some custom made 80 wire UDMA IDE cables sorted ASAP. That should squeeze extra speed from all the drives on the machine.
You shouldn't need custom cables. IDE 80 pin cables can be sourced all over the Internet for around $5 a cable. Prices have gone up since they are now not common. You might even post a wanted ad on craigslist and see if you can get a handful for a few bucks.
Hi Ryan.
I have a tall tower case with six 5.25" drive bays. The standard off-the-shelf IDE cables tend to have the slave connector about half way up the cable. So this means I can only use the master connector on standard cables. If I try to use the slave connetor, that makes the cable to short to reach the m/b.
These people here:
http://estore.circuitassembly.com/products/Custom-ATA-66-100-133-IDE-Cable-U...
can make custom cables for not much more than an off-the-shelf version. The maximum length is 40", and I can choose where I want the slave connector to be. Anything from 3" to 13" away from the master connector. Now that's what I call service!
Kind Regards,
Keith Roberts
----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------