I decided, after the last discussion of smartd and S.M.A.R.T. disks, to take a look in my /var/log/messages, and I'm seeing fair bit of this:
Sep 10 20:11:23 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Offline uncorrectable sectors Sep 10 20:41:23 mhrichter smartd[3361]: Device: /dev/hdb, 21 Currently unreadable (pending) sectors Sep 10 20:41:24 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Currently unreadable (pending) sectors Sep 10 20:41:24 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Offline uncorrectable sectors Sep 10 21:11:23 mhrichter smartd[3361]: Device: /dev/hdb, 21 Currently unreadable (pending) sectors Sep 10 21:11:23 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Currently unreadable (pending) sectors Sep 10 21:11:23 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Offline uncorrectable sectors
Clearly there is a minor problem on /dev/hdb, which doesn't really surprise me, nor is it particularly worrisome (because I don't use that drive much).
However, the other one I find more than a little curious.
/dev/sda is a Seagate 300GB SATA drive that's coming up on two years old next month, but the number of "Currently unreadable (pending) sectors" or "Offline uncorrectable sectors," depending on which one you believe, is interesting - 4294967295 is FFFFFFFF in hex, and I'm running a 64-bit machine.
Google is not particularly informative on this subject - anyone know more than general suggestions about dd, badblocks, etc.? This is my boot and primary system disk (has been for some time), but the error message is essentially meaningless (to me, right now).
Thanks.
mhr
PS: In the last couple of months there was a discussion of how to make the disk less active, starting with someone reporting that their disk drive activity light blinked every 30 seconds or something like that. I tried to find it again, but I couldn't pin down what to search for - what was the solution?
MHR wrote:
Google is not particularly informative on this subject - anyone know more than general suggestions about dd, badblocks, etc.? This is my boot and primary system disk (has been for some time), but the error message is essentially meaningless (to me, right now).
Download the manufacturer's tools and run a diagnostics on it, it will tell you the truth about what's going on.
I wouldn't trust any generic OS tools over the manufacturer's tools, there was a discussion on this topic on this list I think not too long ago. The biggest gotcha with the vendor tools though is they are usually limited in the types of disk controllers they support.
nate
On Wed, Sep 10, 2008 at 10:14 PM, nate centos@linuxpowered.net wrote:
Download the manufacturer's tools and run a diagnostics on it, it will tell you the truth about what's going on.
I wouldn't trust any generic OS tools over the manufacturer's tools, there was a discussion on this topic on this list I think not too long ago. The biggest gotcha with the vendor tools though is they are usually limited in the types of disk controllers they support.
I was going to laugh this off 'cuz how many manufacturers support Linux, but I was pleasantly surprised, twice, when I found that a) Seagate does and b) the seatools for Linux produced no errors on the long test.
It also told me lots of interesting information that I don't recall at the moment, not the least of which was that the drive does not support DST (the on-board diagnostics test), which I thought was odd.
Based on some of the other responses, I think I'll run smartctl to see what it says, but that still doesn't really answer the question about the number (4294967295 which happens to be FFFFFFFF). There are only a little over 5 billion sectors on the disk in total - how could 4.3 billion of them be bad?
I'm thinking it's more likely a 32-bit v. 64-bit issue, but I haven't finished looking at that yet.
One other thing that I find interesting: the drives that are showing smart errors are /dev/hdb and /dev/sda. In order from oldest to newest, my drives are:
/dev/hdb - Maxtor 120GB PATA /dev/hda - Maxtor 160GB PATA /dev/sda - Seagate 300GB SATA /dev/sdb - WD 320GB SATA
The older of each of the PATA and SATA drives are the ones showing the errors....
Thanks.
mhr
MHR wrote:
I was going to laugh this off 'cuz how many manufacturers support Linux, but I was pleasantly surprised, twice, when I found that a) Seagate does and b) the seatools for Linux produced no errors on the long test.
I wasn't aware there was a seatools for Linux, I meant to refer to the bootable versions of the tools that run outside of any OS.
But perhaps the vendor tools have improved and can reliably detect faults from within an OS, it's been several years since I've had to use them.
nate
On Thu, Sep 11, 2008 at 11:15 AM, nate centos@linuxpowered.net wrote:
I wasn't aware there was a seatools for Linux, I meant to refer to the bootable versions of the tools that run outside of any OS.
You have to dig in to find them, but yep, they are there!
But perhaps the vendor tools have improved and can reliably detect faults from within an OS, it's been several years since I've had to use them.
I'll probably get the DOS tool, too. It usually /is/ more reliable to test a drive that isn't running, esp. the boot drive....
Thanks.
mhr
On Thu, 2008-09-11 at 11:03 -0700, MHR wrote:
On Wed, Sep 10, 2008 at 10:14 PM, nate centos@linuxpowered.net wrote:
Download the manufacturer's tools and run a diagnostics on it, it will tell you the truth about what's going on.
I wouldn't trust any generic OS tools over the manufacturer's tools,
<snip>
I was going to laugh this off 'cuz how many manufacturers support Linux, but I was pleasantly surprised, twice, when I found that a) Seagate does and b) the seatools for Linux produced no errors on the long test.
IIRC, the seatools just run the smart tools that come on CentOS/Linux. Not the same as those on the DOS tools version. It's been several months, but barring memory failures (mine, not the computer's ;-) I ended up downloading the DOS ones so that I could do the repair and run the "real magilla".
It also told me lots of interesting information that I don't recall at the moment, not the least of which was that the drive does not support DST (the on-board diagnostics test), which I thought was odd.
Try the DOS version. I bet the lack of that support is in the standard *IX smart tools, not the drive.
<snip>
One other thing that I find interesting: the drives that are showing smart errors are /dev/hdb and /dev/sda. In order from oldest to newest, my drives are:
/dev/hdb - Maxtor 120GB PATA /dev/hda - Maxtor 160GB PATA /dev/sda - Seagate 300GB SATA /dev/sdb - WD 320GB SATA
The older of each of the PATA and SATA drives are the ones showing the errors....
If all drives left the factory in great shape, it is natural that the older ones would show an error first. Often just a "weak" spot or two that passed mfg tests and finally failed as they aged. That's why I don't worry about them (I don't have data center servers to the world here at home) as long as the repair utilities run successfully and then no more show up for a long time. If they start coming in frequent bursts, then it's time to act.
BTW, most warranty replacements are "reconditioned" drives that have nothing more than diagnostics run and bad sectors reassigned. As long as total capacity still meets advertised and the mechanics/electrics and media (high %) are still good, they'll ship them.
Thanks.
mhr
<snip sig stuff>
HTH
On Thu, Sep 11, 2008 at 2:05 PM, William L. Maltby CentOS4Bill@triad.rr.com wrote:
IIRC, the seatools just run the smart tools that come on CentOS/Linux. Not the same as those on the DOS tools version. It's been several months, but barring memory failures (mine, not the computer's ;-) I ended up downloading the DOS ones so that I could do the repair and run the "real magilla".
I'm not so sure about that, but I'd have to check. It was the Seatools program, not smartctl (at least not directly). And it's "megilla," ya goysiher kopf!
Try the DOS version. I bet the lack of that support is in the standard *IX smart tools, not the drive.
I don't think so - it only commented on these from the Seagate, not the WD, and it explicitly states that the DST is not supported on the drive (although that is /just/ ambiguous enough...).
If all drives left the factory in great shape, it is natural that the older ones would show an error first. Often just a "weak" spot or two that passed mfg tests and finally failed as they aged. That's why I don't worry about them (I don't have data center servers to the world here at home) as long as the repair utilities run successfully and then no more show up for a long time. If they start coming in frequent bursts, then it's time to act.
Well, yeah, of course, but why would my Max 160 be error free and the Seagate have 4 billion when the latter is (a year or so) newer? (Rhetorical question!)
BTW, most warranty replacements are "reconditioned" drives that have nothing more than diagnostics run and bad sectors reassigned. As long as total capacity still meets advertised and the mechanics/electrics and media (high %) are still good, they'll ship them.
I've noticed that - really annoying, but then, what're ya gonna do when there's no will to enact laws requiring manufacturers to provide quality products to begin with, and then replace them appropriately under warranty?
Ciao.
mhr
On Wed, Sep 10, 2008 at 9:41 PM, MHR mhullrich@gmail.com wrote:
I decided, after the last discussion of smartd and S.M.A.R.T. disks, to take a look in my /var/log/messages, and I'm seeing fair bit of this:
Sep 10 20:11:23 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Offline uncorrectable sectors Sep 10 20:41:23 mhrichter smartd[3361]: Device: /dev/hdb, 21 Currently unreadable (pending) sectors
(snip)
Google is not particularly informative on this subject - anyone know more than general suggestions about dd, badblocks, etc.? This is my boot and primary system disk (has been for some time), but the error message is essentially meaningless (to me, right now).
You should start thinking of replacing the disk. There is a discussion in the forum:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=15880&forum=3...
I am one of the people there who were getting the same error and replaced the disk.
Akemi / toracat
On Thursday 11 September 2008 08:02:23 Akemi Yagi wrote:
You should start thinking of replacing the disk. There is a discussion in the forum:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=15880&forum=3...
I am one of the people there who were getting the same error and replaced the disk.
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
Anne
On Thu, Sep 11, 2008 at 12:26 AM, Anne Wilson cannewilson@googlemail.com wrote:
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
I'm pretty sure the Seagate warranty is no longer in force - most of them are a year. Maxtor's are, for sure - I've had to go through that once before, and they were quite cooperative, too, but that was a few years back (before Seagate bought them).
mhr
MHR wrote:
On Thu, Sep 11, 2008 at 12:26 AM, Anne Wilson cannewilson@googlemail.com wrote:
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
I'm pretty sure the Seagate warranty is no longer in force - most of them are a year. Maxtor's are, for sure - I've had to go through that once before, and they were quite cooperative, too, but that was a few years back (before Seagate bought them).
depending on the drive and how it was sold, Seagate drives can have a 3 or even 5 year warranty.
OTOH, major OEM stuff sold embedded in a packaged system is the responsibility of the OEM warranty (HP, Dell, etc etc). 'whitebox' OEM stuff bought as parts at computer stores, you're the OEM, and they have some level of warranty from Seagate, but I forget what it is specifically, its likely to be 1 year.
On Thu, Sep 11, 2008 at 11:07:25AM -0700, MHR enlightened us:
On Thu, Sep 11, 2008 at 12:26 AM, Anne Wilson cannewilson@googlemail.com wrote:
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
I'm pretty sure the Seagate warranty is no longer in force - most of them are a year. Maxtor's are, for sure - I've had to go through that once before, and they were quite cooperative, too, but that was a few years back (before Seagate bought them).
Seagate has a 5 year warranty on its drives. You might check again.
Matt
Matt Hyclak wrote:
On Thu, Sep 11, 2008 at 11:07:25AM -0700, MHR enlightened us:
On Thu, Sep 11, 2008 at 12:26 AM, Anne Wilson cannewilson@googlemail.com wrote:
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
I'm pretty sure the Seagate warranty is no longer in force - most of them are a year. Maxtor's are, for sure - I've had to go through that once before, and they were quite cooperative, too, but that was a few years back (before Seagate bought them).
Seagate has a 5 year warranty on its drives. You might check again.
Just put the serial number in here: http://support.seagate.com/customer/warranty_validation.jsp
On Thu, 11 Sep 2008, MHR wrote:
On Thu, Sep 11, 2008 at 12:26 AM, Anne Wilson cannewilson@googlemail.com wrote:
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
I'm pretty sure the Seagate warranty is no longer in force - most of them are a year.
???? you patronize the wrong vendors -- 5 year Seagate warranty and low defects are my expereince.
-- Russ herrold
MHR wrote:
On Thu, Sep 11, 2008 at 12:26 AM, Anne Wilson cannewilson@googlemail.com wrote:
I had similar messages on this laptop. Acer accepted liability and replaced the disk.
I'm pretty sure the Seagate warranty is no longer in force - most of them are a year. Maxtor's are, for sure - I've had to go through that once before, and they were quite cooperative, too, but that was a few years back (before Seagate bought them).
mhr _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
I recently returned a disk that was > 2 years old to teh Seagate agents here in SA. They swapped it out without any hassles - as far as I remember they then said that thwe warranty was 5 years
ChrisG
On Thu, Sep 11, 2008 at 12:02 AM, Akemi Yagi amyagi@gmail.com wrote:
You should start thinking of replacing the disk.
I am, thanks.
There is a discussion in the forum:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=15880&forum=3...
I am one of the people there who were getting the same error and replaced the disk.
Scary stuff, to some extent. I should probably point out that the Maxtor 120GB PATA drive (the one with errors I believe are real) had a power connector problem for a while that may have damaged it, but I haven't seen anything funny with it since, and that was back when I changed the CPU/MB in March, 2007 and then the power supply about a month later when that burned out altogether.
I'm watchinc it now!
mhr
On Wed, 2008-09-10 at 21:41 -0700, MHR wrote:
I decided, after the last discussion of smartd and S.M.A.R.T. disks, to take a look in my /var/log/messages, and I'm seeing fair bit of this:
Sep 10 20:11:23 mhrichter smartd[3361]: Device: /dev/sda, 4294967295 Offline uncorrectable sectors Sep 10 20:41:23 mhrichter smartd[3361]: Device: /dev/hdb, 21 Currently unreadable (pending) sectors
<snip>
Google is not particularly informative on this subject - anyone know more than general suggestions about dd, badblocks, etc.? This is my boot and primary system disk (has been for some time), but the error message is essentially meaningless (to me, right now).
A google using
manufacturer smart site::centos.org
should lead to a couple good threads on this list. I'd cite the recent related, but I'm short of time ATM.
As long as you only see one, or very few, errors and very limited growth in the number, no worry IMO. However, to confirm this, use smartctl to get a full check and logging done. Then use it to review the logs.
I've got one that has had 2 errors for more than 6 months now. Used the manufacturer tools, it got repaired, only one occurrence since.
Thanks.
mhr
<snip>