[CentOS] Question re RHEL 5.3

Sun Nov 2 09:03:31 UTC 2008

On Sat, Nov 1, 2008 at 6:31 AM, Kai Schaetzl <maillists at conactive.com> wrote:
> Mhr wrote on Wed, 29 Oct 2008 17:59:40 -0700:
>
>> The one problem I've seen and posted here was w.r.t. smartd error
>> reports showing 2^32 - 1 errors on one of the disks (probably my
>> system disk) every few minutes.
>
> How has this anything to do with "SATA problems/drive handling"?

Possibly because my system drive is a SATA disk?  (FTR, the drive does
not appear to be the slightest bit unstable and it runs just fine.  In
fact, I recently modified the system so that it now runs on three
SATA-2 drives exclusively.  For whatever reason, the WD drives do not
report any errors - see also below.)

> And could you please use a decent subject next time?

When I select the subject, I usually do.  This was a reply to a
thread, so I didn't pick the subject.  There's no need to be testy....

> Regarding your problem: Have you done a smartctl selftest since then, did
> you go to smartmontools.sf.net since then and read up on smartmon?

Yes and not until now, in that order.  The smartctl selftest has the
same problem, IIRC, but the seatools test showed nothing wrong.

> This may just be a problem with smartd not being able to handle the error
> codes/number of errors from that disk. If you look at smartmontools.sf.net
> and read the man you'll see that vendors are quite inconsistent in what
> and how they report and a reversal of byte ordering every now and then
> seems to be common. Not to mention that ther smartmon shipping with CentOS
> naturally doesn't include the latest code.

All good information, thank you.  I did not see anything specific to
the issue I am seeing, which is that every half hour, smartd reports
the following:

Nov  2 01:56:11 mhrichter smartd[3121]: Device: /dev/sda, 4294967295
Currently unreadable (pending) sectors
Nov  2 01:56:11 mhrichter smartd[3121]: Device: /dev/sda, 4294967295
Offline uncorrectable sectors

In each case, it also sends a warning email to root, which is kind of
annoying since these do not appear to be legitimate error conditions.

Someone mentioned that this is a recurring problem with Seagate drives
- more info, please?

Thanks.

mhr