Hi all, Finally I temporarily migrate my mailserver to another box. It has been experiencing troubles with the SATA discs (Maxtor and Western Digital).
I then upgraded the kernel to the latest one: 2.6.9-34.ELsmp and now smartd can run.
When I do: smartcl -d ata -a /dev/sda, it says some errors but I don't know what does it mean.
This is the result: http://pastebin.com/624613 The problems it encoutered were: time out on device. Pls take a look at the pastebin for details.
Is the drive still ok?
Running smartcl on sdb gives no error. Thanks,
On Mon, 27 Mar 2006, Fajar Priyanto wrote:
Finally I temporarily migrate my mailserver to another box. It has been experiencing troubles with the SATA discs (Maxtor and Western Digital).
I then upgraded the kernel to the latest one: 2.6.9-34.ELsmp and now smartd can run.
When I do: smartcl -d ata -a /dev/sda, it says some errors but I don't know what does it mean.
This is the result: http://pastebin.com/624613 The problems it encoutered were: time out on device. Pls take a look at the pastebin for details.
Is the drive still ok?
Running smartcl on sdb gives no error.
Disable DMA to see if that results in errors as well. But it seems sda has uncorrectable errors.
Kind regards, -- dag wieers, dag@wieers.com, http://dag.wieers.com/ -- [all I want is a warm bed and a kind word and unlimited power]
On Monday 27 March 2006 09:24 pm, Dag Wieers wrote:
Disable DMA to see if that results in errors as well. But it seems sda has uncorrectable errors.
Hi Dag, After some researches on the net, I came upon some posting that SMART enabled disk will keep any error it encounters in it's log/memory. So, any in my case, it seems that the error was occuring in the past for quite some time comparing to the online hour of the disk.
Nevertheless, I reformated sda2 and also performed a double surface scan on it, but after half a day, it turned out to be ok. Very strange. Also currently it runs without any error at all. I plan to monitor it for a couple of days.
Looking at there strangeness, I suspect the errors could be caused by: 1. Electrical surge from an old UPS I used on the box 2. The temperatur of the server room is too cold? It's about 18 degree celcius on the day, and could reach 14 degree in the night I guess. Could it be too cold? Because currently I put the box on my desk with normal room temp around 28 degree.
I don't know. But from the research, some websites are suggesting to avoid maxtor and wd disks. http://forums.hardwareguys.com/ikonboard.cgi?act=ST&f=3&t=3947
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, Mar 28, 2006 at 12:36:51AM +0700, Fajar Priyanto wrote:
- Electrical surge from an old UPS I used on the box
- The temperatur of the server room is too cold? It's about 18 degree celcius
on the day, and could reach 14 degree in the night I guess. Could it be too cold? Because currently I put the box on my desk with normal room temp around 28 degree.
Definitively not too cold. Actually, you can run the server are 0 celcius and it will be very happy.
I usually recommend 21 degrees are the ideal temp for room where you need to have people inside, and 18 when there is no need for people all the time. For critical operation, 16 is even better.
One might think that 28 degree is too hot, tho.
- -- Rodrigo Barbosa rodrigob@suespammers.org "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
18C is a good temperature and most AC units will have this as their lowest setting....
21C is ok and will save significantly on your Electricity bill and wear and tear on the AC units themselves over 18C...
P.
Rodrigo Barbosa wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, Mar 28, 2006 at 12:36:51AM +0700, Fajar Priyanto wrote:
- Electrical surge from an old UPS I used on the box
- The temperatur of the server room is too cold? It's about 18 degree celcius
on the day, and could reach 14 degree in the night I guess. Could it be too cold? Because currently I put the box on my desk with normal room temp around 28 degree.
Definitively not too cold. Actually, you can run the server are 0 celcius and it will be very happy.
I usually recommend 21 degrees are the ideal temp for room where you need to have people inside, and 18 when there is no need for people all the time. For critical operation, 16 is even better.
One might think that 28 degree is too hot, tho.
Rodrigo Barbosa rodrigob@suespammers.org "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFEKCQ3pdyWzQ5b5ckRAvemAJ0VtEQdhUBPZM6rTTnvkgi4PkGjmgCgkrs4 ljM0aYOkIHBBc855fi2/FDs= =C4bx -----END PGP SIGNATURE----- _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Tue, 28 Mar 2006, Fajar Priyanto wrote:
On Monday 27 March 2006 09:24 pm, Dag Wieers wrote:
Disable DMA to see if that results in errors as well. But it seems sda has uncorrectable errors.
Hi Dag, After some researches on the net, I came upon some posting that SMART enabled disk will keep any error it encounters in it's log/memory. So, any in my case, it seems that the error was occuring in the past for quite some time comparing to the online hour of the disk.
That's correct. However, normally a disk is able to correct problems by relocating the impacted sectors. In your case it was an uncorrectable error which seems to imply the relocation failed and normally that is a very bad sign.
I'm not sure what the caused it, but I wouldn't trust it though. Of course it might have been a SATA driver issue that hopefully is fixed.
Nevertheless, I reformated sda2 and also performed a double surface scan on it, but after half a day, it turned out to be ok. Very strange. Also currently it runs without any error at all. I plan to monitor it for a couple of days.
sda2 is a partition, so I hope you meant to say sda.
Kind regards, -- dag wieers, dag@wieers.com, http://dag.wieers.com/ -- [all I want is a warm bed and a kind word and unlimited power]