I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
Everything seems to be working fine but these errors do have me concerned.
The last time they occurred I was created a fairly large cpio backup file in the '/tmp' directory.
The drive has been checked via fsck, and any errors found were reported to be corrected.
Should I take some kind of action on this, or just ignore the errors?
centos-bounces@centos.org <> scribbled on Tuesday, August 02, 2005 9:58 PM:
I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
Everything seems to be working fine but these errors do have me concerned.
The last time they occurred I was created a fairly large cpio backup file in the '/tmp' directory.
The drive has been checked via fsck, and any errors found were reported to be corrected.
Should I take some kind of action on this, or just ignore the errors?
Try running smartctl -a /dev/hdx against the drive.
Mike
On Tue, 2005-02-08 at 19:57 -0700, BRUCE STANLEY wrote:
I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
I've recently seen similar errors on an old server here, and all my Googling seems to indicate an early sign of imminent drive failure.
Some people reported resolving the issue by making changes to DMA settings with hdparm, or changing something in the BIOS. However, in most cases, the recommendations were:
1. Backup data, ASAP, and wait for failure before moving to new disk 2. Use fdisk to map out the bad sectors. 3. Get a new HD, ASAP.
I've been doing option 1. I have another server waiting in the wings, so it'll be able to take over when the time comes.
HTH,
Ranbir
On Tue, 2005-08-02 at 23:58 -0400, Kanwar Ranbir Sandhu wrote:
On Tue, 2005-02-08 at 19:57 -0700, BRUCE STANLEY wrote:
I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
I've recently seen similar errors on an old server here, and all my Googling seems to indicate an early sign of imminent drive failure.
Some people reported resolving the issue by making changes to DMA settings with hdparm, or changing something in the BIOS. However, in most cases, the recommendations were:
- Backup data, ASAP, and wait for failure before moving to new disk
- Use fdisk to map out the bad sectors.
- Get a new HD, ASAP.
I used these errors as a good reason to buy a bigger drive and the new drive had the same error messages. After testing with seagates tool, I can find no problem with either drive. I disable dma as a temporary solution, but I think it is not a drive fault in my case. Do back up :)
Rohan Walsh wrote:
I used these errors as a good reason to buy a bigger drive and the new drive had the same error messages. After testing with seagates tool, I can find no problem with either drive. I disable dma as a temporary solution, but I think it is not a drive fault in my case. Do back up :)
Looks like the data cable to the drive is either broken or too long. At least if you see CRC errors - these normally happen on the transport medium and not within the drive itself.
Ralph
On Wed, 2005-08-03 at 13:30 +0200, Ralph Angenendt wrote:
Looks like the data cable to the drive is either broken or too long. At least if you see CRC errors - these normally happen on the transport medium and not within the drive itself.
There have always been ATA bus and drive arbitration issues at various points in the Linux kernel as chipset and drive vendors differed on the implementation of the ATA spec. There was also a time when Hendrick (I believe?) moved his ATA driver work out of kernel 2.4 (around 2.4.18?).
In such cases, these CRC errors and bus resets are very, very typical.
It's one of the main reasons I buy 3Ware Escalade 7006-2 and 8006-2 cards quite liberally -- to avoid having to deal with arbitrary ATA issues due to chipsets and drives fighting and lack of support in Linux's ATA driver to enforce discipline on either.
--- Rohan Walsh rohan_walsh@yahoo.com.au wrote:
On Tue, 2005-08-02 at 23:58 -0400, Kanwar Ranbir Sandhu wrote:
On Tue, 2005-02-08 at 19:57 -0700, BRUCE STANLEY wrote:
I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
I've recently seen similar errors on an old server here, and all my Googling seems to indicate an early sign of imminent drive failure.
Some people reported resolving the issue by making changes to DMA settings with hdparm, or changing something in the BIOS. However, in most cases, the recommendations were:
- Backup data, ASAP, and wait for failure before moving to new disk
- Use fdisk to map out the bad sectors.
- Get a new HD, ASAP.
I used these errors as a good reason to buy a bigger drive and the new drive had the same error messages. After testing with seagates tool, I can find no problem with either drive. I disable dma as a temporary solution, but I think it is not a drive fault in my case. Do back up :)
How did you go about disabling dma?
Change your IDE cable, be happy. (old IDE have fewer wire) M
BRUCE STANLEY wrote:
--- Rohan Walsh rohan_walsh@yahoo.com.au wrote:
On Tue, 2005-08-02 at 23:58 -0400, Kanwar Ranbir Sandhu wrote:
On Tue, 2005-02-08 at 19:57 -0700, BRUCE STANLEY wrote:
I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
I've recently seen similar errors on an old server here, and all my Googling seems to indicate an early sign of imminent drive failure.
Some people reported resolving the issue by making changes to DMA settings with hdparm, or changing something in the BIOS. However, in most cases, the recommendations were:
- Backup data, ASAP, and wait for failure before moving to new disk
- Use fdisk to map out the bad sectors.
- Get a new HD, ASAP.
I used these errors as a good reason to buy a bigger drive and the new drive had the same error messages. After testing with seagates tool, I can find no problem with either drive. I disable dma as a temporary solution, but I think it is not a drive fault in my case. Do back up :)
How did you go about disabling dma?
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
BRUCE STANLEY wrote:
How did you go about disabling dma?
Typically it's: # hdparm -d 0 /dev/hdX
Manuel BERTRAND manuel.bertrand@lif.univ-mrs.fr wrote:
Change your IDE cable, be happy. (old IDE have fewer wire)
Well, it all depends.
Normally during POST, if the BIOS detects an UltraDMA Mode 3** (50MBps**) or higher** signaling, and the cable is not augmented with 40 ground wires (the 80-conductor type), it is supposed to reduce the signaling to UltraDMA Mode 2 (33MBps). But the OS and/or drive might optimize its signaling to a higher rate once the kernel loads, or at boot-time optimization.
[ **NOTE: UltraDMA mode 3 (50MBps) and 4 (66MBps) are commonly referred to as Ultra66 ]
But it could be a variety of problems: - Cable used (40-pin v. 40-pin+40-ground aka "Ultra66 cable") - Multiple ATA devices on a channel conflicting - Hdparm optimizing the ATA channel and IDE disk incorrectly - Many, many others
But in every case I had, when the drive _had_ worked prior without issue, it was a kernel upgrade and that ATA driver. As I mentioned before, there have been many documented cases of this, as well as the politics of the ATA code in the kernel (around 2.4.18-19 IIRC).
Bryan J. Smith wrote:
- Multiple ATA devices on a channel conflicting
Right, I forgot this one. Check if you don't have an old cd-drive on the same IDE cable.
- Hdparm optimizing the ATA channel and IDE disk incorrectly
Did you tweak this?
- Many, many others
But in every case I had, when the drive _had_ worked prior without issue, it was a kernel upgrade and that ATA driver. As I mentioned before, there have been many documented cases of this, as well as the politics of the ATA code in the kernel (around 2.4.18-19 IIRC).
--- Manuel BERTRAND manuel.bertrand@lif.univ-mrs.fr wrote:
Bryan J. Smith wrote:
- Multiple ATA devices on a channel conflicting
Right, I forgot this one. Check if you don't have an old cd-drive on the same IDE cable.
- Hdparm optimizing the ATA channel and IDE disk incorrectly
Did you tweak this?
- Many, many others
But in every case I had, when the drive _had_ worked prior without issue, it was a kernel upgrade and that ATA driver. As I mentioned before, there have been many documented cases of this, as well as the politics of the ATA code in the kernel (around 2.4.18-19 IIRC).
Hi Manuel!
My IDE set up is as follows:
IDE channel 0 -- IBM 40GB UDMA 100 7200 RPM drive 2MB buffer --- Segate 120GM UDMA 100 7000 RPM drive 8MB buffer
IDE channel 1 --- TDK CD-ROM drive 24X --- TDK CD-RW drive 16x burn, 48x read
Both controller cables are UDMA 100 compatible.
BRUCE STANLEY bruce.stanley@prodigy.net wrote:
Hi Manuel! My IDE set up is as follows: IDE channel 0 --- IBM 40GB UDMA 100 7200 RPM drive 2MB
buffer
--- Segate 120GM UDMA 100 7000 RPM drive 8MB
buffer
IDE channel 1 --- TDK CD-ROM drive 24X --- TDK CD-RW drive 16x burn, 48x read Both controller cables are UDMA 100 compatible.
Ew, old-style PIO master/slave used for ATA DMA (not recommended).
Furthermore, you're using your newer devices as slaves, so they are stuck with the ATA support of their masters.
Lastly, you're using different vendors for the disks -- IBM (and WD drives, who OEM'd IBM drives at that time) had numerous issues with the ATA logic Intel ICH and ViA 82xx southbridges.
--- "Bryan J. Smith" b.j.smith@ieee.org wrote:
BRUCE STANLEY bruce.stanley@prodigy.net wrote:
Hi Manuel! My IDE set up is as follows: IDE channel 0 --- IBM 40GB UDMA 100 7200 RPM drive 2MB
buffer
--- Segate 120GM UDMA 100 7000 RPM drive 8MB
buffer
IDE channel 1 --- TDK CD-ROM drive 24X --- TDK CD-RW drive 16x burn, 48x read Both controller cables are UDMA 100 compatible.
Ew, old-style PIO master/slave used for ATA DMA (not recommended).
Furthermore, you're using your newer devices as slaves, so they are stuck with the ATA support of their masters.
Lastly, you're using different vendors for the disks -- IBM (and WD drives, who OEM'd IBM drives at that time) had numerous issues with the ATA logic Intel ICH and ViA 82xx southbridges.
-- Bryan J. Smith | Sent from Yahoo Mail
Hi Bryan!
I should also mentions that the drives are installed in removabl drive bays so that I can swap then out with other drives without messing with the cables.
Thus, I can put in different master boot drives for different operating systems.
I have seen very little problems of the nature discribed with FC2, Mandrake 8.1, or Suse 8.2.
I have not seen any problems when using Windows 2000 pro.
Is this problem occuring more with newer versions of the Red Hat base kernals? It could still be a hardware problem of some sort with the system.
When I get home tonight I'll open up the box and re-seat the cables just to be sure that they are ok.
BRUCE STANLEY bruce.stanley@prodigy.net wrote:
Hi Bryan! I should also mentions that the drives are installed in removabl drive bays so that I can swap then out with other drives without messing with the cables.
Ewwwww (even more). ATA is nasty for hot-swap. SATA's SCA-like, staggered power/transient is much better.
I have seen very little problems of the nature discribed with FC2, Mandrake 8.1, or Suse 8.2.
Most of the problems I've seen were early 2.4 and mid 2.4 (circa 2.4.18-20).
Early 2.4 issues were due to Intel ICH and ViA 82xx ATA logic having issues with IBM (which were also Western Digital at the time) drives. Long story short, IBM screwed up on the ATA spec. Intel shared specs and got the problem fixed. ViA horded specs and the problem continued.
Mid 2.4, Hendrick ripped out his ATA code because of some vendors continually violating the GPL. I can't remember what the resolution of that was (I think he put them back in within a few revisions).
Bus arbitration between the "dumb" AT Attachment (ATA) channel and the Integrated Drive Electonics (IDE) on the drive is always a PITA for the host/OS. Sometimes some ATA channels just don't have registers/support, or more often, the IDE is not to spec, and the ATA channel doesn't work as designed (and the host/OS is left resetting the channel to try to reconnect the two).
It was enough that I made it a general rule to deploy 3Ware for production systems, even desktops. Why? Because 3Ware is both the host/OS (ASIC/firmware) and the ATA channel, so it can more easily "tame" troublesome IDE. I.e., the 3Ware ASIC/firmware is driving the IDE directly, and works in tandem with the 3Ware ATA channel logic.
I have not seen any problems when using Windows 2000 pro.
Of course, because ATA channel vendors provide their drivers, with all that nice code negotiated under NDA, etc... At the same time, have you ever tried the "included" NT ATA drivers? Not always pretty. ;->
Is this problem occuring more with newer versions of the Red Hat base kernals?
See my comments above on kernel 2.4.
It could still be a hardware problem of some sort with the system.
The ATA spec is _rarely_ followed "to-the-letter."
IBM has been guilty many, many times of this (as well as with buggy BIOS Int13h services too), which affected Western Digital as well until the switch to Hitachi (and IBM's sell-off to the same).
Intel's early ICH ATA for the i8xx chipsets, from the former PIIX ATA on the earlier i4xx chipsets also caused a lot of issues. But Intel worked them out rather quickly.
ViA has always horded it's specs, and Linux ATA issues plagued many of its early 82xx ATA logic, up until about 8233 or 8235.
AMD has only designed 1 ATA logic, the original one in the 751, and it is still used in the 760 as well as the AMD8111.
SiS used to be the same with its 551x ATA logic, although I have not tried its latest chipsets (since the Socket-462).
When I get home tonight I'll open up the box and re-seat the cables just to be sure that they are ok.
The exact make/model of your southbridge ATA would help -- use "lspci -v". If it is ViA, it should be VT823x. Get the exact model for me and I can help you more.
The exact output of "hdparm -i /dev/hda" and "hdparm -i /dev/hdb" would help as well.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, Aug 03, 2005 at 01:17:33PM -0700, Bryan J. Smith wrote:
BRUCE STANLEY bruce.stanley@prodigy.net wrote:
Hi Bryan! I should also mentions that the drives are installed in removabl drive bays so that I can swap then out with other drives without messing with the cables.
Ewwwww (even more). ATA is nasty for hot-swap. SATA's SCA-like, staggered power/transient is much better.
I really don't think he hotswaps them. Most like turning the computer off, replacing the drive, and turning back on.
I have seen this done many times, mostly on computer course labs, or test boxes.
But I agree, ATA for hot-swaping is a BAD idea.
[]s
- -- Rodrigo Barbosa rodrigob@suespammers.org "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
--- Rodrigo Barbosa rodrigob@suespammers.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, Aug 03, 2005 at 01:17:33PM -0700, Bryan J. Smith wrote:
BRUCE STANLEY bruce.stanley@prodigy.net wrote:
Hi Bryan! I should also mentions that the drives are installed in removabl drive bays so that I can swap then out with other drives without messing with the cables.
Ewwwww (even more). ATA is nasty for hot-swap. SATA's SCA-like, staggered power/transient is much better.
I really don't think he hotswaps them. Most like turning the computer off, replacing the drive, and turning back on.
I have seen this done many times, mostly on computer course labs, or test boxes.
But I agree, ATA for hot-swaping is a BAD idea.
[]s
Rodrigo Barbosa rodrigob@suespammers.org
Hi Rodrigo! Your are correct, I do not hot swap them even though the product claims that you can. I never trusted that with IDE.
Rodrigo Barbosa rodrigob@suespammers.org wrote:
I really don't think he hotswaps them. Most like turning the computer off, replacing the drive, and turning back on. I have seen this done many times, mostly on computer course labs, or test boxes. But I agree, ATA for hot-swaping is a BAD idea.
BRUCE STANLEY bruce.stanley@prodigy.net wrote:
Hi Rodrigo! Your are correct, I do not hot swap them even though the product claims that you can. I never trusted that with
IDE.
The hot-swap/transient is just a bonus of the SATA SCA. Although you have to have an intelligence behind SATA SCA, just like any ATA hot-swap (i.e., hard to find outside of 3Ware).
The reality is that I've had issue after issue with many [parallel] ATA bays and connections. Same deal with non-SCA SCSI cans.
I only go for SATA SCA or SCSI SCA, with rare exception. I.e., only for test systems.
--- "Bryan J. Smith" b.j.smith@ieee.org wrote:
IBM has been guilty many, many times of this (as well as with buggy BIOS Int13h services too), which affected Western Digital as well until the switch to Hitachi (and IBM's sell-off to the same).
Intel's early ICH ATA for the i8xx chipsets, from the former PIIX ATA on the earlier i4xx chipsets also caused a lot of issues. But Intel worked them out rather quickly.
ViA has always horded it's specs, and Linux ATA issues plagued many of its early 82xx ATA logic, up until about 8233 or 8235.
AMD has only designed 1 ATA logic, the original one in the 751, and it is still used in the 760 as well as the AMD8111.
SiS used to be the same with its 551x ATA logic, although I have not tried its latest chipsets (since the Socket-462).
When I get home tonight I'll open up the box and re-seat the cables just to be sure that they are ok.
The exact make/model of your southbridge ATA would help -- use "lspci -v". If it is ViA, it should be VT823x. Get the exact model for me and I can help you more.
The exact output of "hdparm -i /dev/hda" and "hdparm -i /dev/hdb" would help as well.
-- Bryan J. Smith | Sent from Yahoo Mail
Here are the model numbers of the mother board and chipsets:
Motherboard - KT7A-RAID using AMD 1.3Ghz Athlon Thunderbird Chipsets - VIA Apollo KT133A with VT8363A and VT82C686A
On Wed, 2005-08-03 at 19:54 -0700, BRUCE STANLEY wrote:
Motherboard - KT7A-RAID using AMD 1.3Ghz Athlon Thunderbird Chipsets - VIA Apollo KT133A with VT8363A and VT82C686A
Yep, I had that _exact_ mainboard. 82c686[A] was the PITA of ATA-4 UltraDMA mode 4 capable designs. Even the early VT823x's were an improvement.
DMA timeouts were regular with different kernels, especially IBM (also WD at the time) drives.
On Wed, 2005-08-03 at 23:19 -0500, Bryan J. Smith wrote:
On Wed, 2005-08-03 at 19:54 -0700, BRUCE STANLEY wrote:
Motherboard - KT7A-RAID using AMD 1.3Ghz Athlon Thunderbird Chipsets - VIA Apollo KT133A with VT8363A and VT82C686A
Yep, I had that _exact_ mainboard. 82c686[A] was the PITA of ATA-4 UltraDMA mode 4 capable designs. Even the early VT823x's were an improvement.
DMA timeouts were regular with different kernels, especially IBM (also WD at the time) drives.
Samsung 10 & 20GB drives also have showed the same symptoms also for me on a ABIT KR7A-RAID.
Paul
On Wed, August 3, 2005 8:01 am, BRUCE STANLEY said:
--- Rohan Walsh rohan_walsh@yahoo.com.au wrote:
On Tue, 2005-08-02 at 23:58 -0400, Kanwar Ranbir Sandhu wrote:
On Tue, 2005-02-08 at 19:57 -0700, BRUCE STANLEY wrote:
I am getting occasional seek error on my boot drive. The messages are as follows:
pshda: dma_intr: status = 0x51 { Drive ReadySeekComplete Error } hda: dma_intr: error= 0x84 { DriveStatusError BadCRC } ide: failed opcode was: unknown
I've recently seen similar errors on an old server here, and all my Googling seems to indicate an early sign of imminent drive failure.
Some people reported resolving the issue by making changes to DMA settings with hdparm, or changing something in the BIOS. However, in most cases, the recommendations were:
- Backup data, ASAP, and wait for failure before moving to new disk
- Use fdisk to map out the bad sectors.
- Get a new HD, ASAP.
I used these errors as a good reason to buy a bigger drive and the new drive had the same error messages. After testing with seagates tool, I can find no problem with either drive. I disable dma as a temporary solution, but I think it is not a drive fault in my case. Do back up :)
How did you go about disabling dma?
You only want to do that as a last resort, it makes your disk IO very slow.
You would change it using hdparm ... the option would be:
hdparm -d0
You can make it happen every reboot by editing the file:
/etc/sysconfig/harddisks
set:
USE_DMA=0
and remove the # in front of it.
AGAIN ... only if absolutely necessary, because it greatly slows down disk I/O.
Johnny Hughes mailing-lists@hughesjr.com wrote:
You only want to do that as a last resort, it makes your disk IO very slow. You would change it using hdparm ... the option would be: hdparm -d0 You can make it happen every reboot by editing the file: /etc/sysconfig/harddisks set: USE_DMA=0 and remove the # in front of it. AGAIN ... only if absolutely necessary, because it greatly slows down disk I/O.
Yep. Programmed I/O (PIO) means you now involve your CPU and system interconnect, instead of Direct Memory Access (DMA) which transfers directly from drive to memory.
With DMA, the limit is your PCI (peripheral) interconnect, typically 75-100MBps realistically for a legacy 32-bit @ 33MHz PCI with no contention (much higher for PCI64/66, PCI-X or PCIe).
With PIO, you'll quickly find that not only will your CPU usage spike as the CPU gets bogged down with just controlling the I/O channel, but pushing more than 15MBps over a legacy 32-bit @ 33MHz PCI bus is virtually impossible.
can find no problem with either drive. I disable dma as a temporary solution, but I think it is not a drive fault in my case. Do back up :)
How did you go about disabling dma?
You only want to do that as a last resort, it makes your disk IO very slow.
You would change it using hdparm ... the option would be:
hdparm -d0
You can make it happen every reboot by editing the file:
/etc/sysconfig/harddisks
set:
USE_DMA=0
and remove the # in front of it.
AGAIN ... only if absolutely necessary, because it greatly slows down disk I/O.
I used ide=nodma as a kernel option in grub I don't generally do much at the moment that requires high I/O, though I would like to fix this. I will try changing cables.
Without dma: # hdparm -Tt /dev/hda
/dev/hda: Timing cached reads: 700 MB in 2.01 seconds = 348.48 MB/sec Timing buffered disk reads: 12 MB in 3.69 seconds = 3.25 MB/sec
With dma: # hdparm -Tt /dev/hda
/dev/hda: Timing cached reads: 688 MB in 2.01 seconds = 342.34 MB/sec Timing buffered disk reads: 28 MB in 3.09 seconds = 9.05 MB/sec
edit grub.conf -
title CentOS (2.6.9-11.EL) root (hd0,0) kernel /vmlinuz-2.6.9-11.EL ro root=/dev/md1 ide=nodma
How did you go about disabling dma?
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos