Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
Thank you.
Hi,
ext3 is very reliable, i never had such issues (fsck after a power failure, yes... but no data loss). so i whould say its a hardware issue.
Greetings
On 08/12/2010 10:55 AM, Fajar Priyanto wrote:
Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
Thank you. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
2010/8/12 Fajar Priyanto fajarpri@arinet.org:
Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
I think it is broken hardware, due to bad sectors.
Maybe your harddisk is broken or soon breaking up.. check output from smartctl ..
-- Eero, RHCE
From: Fajar Priyanto fajarpri@arinet.org
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^ Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors. Is it just pure random luck, software or hardware issue? What's your experience?
I am pretty sure bad sectors = corruption. http://en.wikipedia.org/wiki/Bad_sector
JD
On 08/12/2010 01:55 AM, Fajar Priyanto wrote:
Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
I would say 'luck'. No common system is normally 100% safe against 'pull the plug' shutdowns. Also, it matters how much disk I/O the system is doing. A system that is idle will tolerate 'pull the plug' better than one actually doing something. Additionally, powering up and powering down is the hardest thing you can do to the *hardware*. Servers should be let run 7/24 - they last longer. Finally, if power failures are taking the machine down, buy a UPS and connect the monitoring cable. I like APC UPSs and apcupsd for monitoring it and automatically shutting the system if needed.
You can improve ext3's resistance to corruption quite a bit if you use the 'journal=data,barrier=1' mount options. Barriers is actually one of the few cases where software RAID or LVM hurts you - they don't honor barriers (at least not in CentOS/RHEL - newer kernels have improved this somewhat). If you are using a hardware RAID card with onboard cache - make **SURE** it has battery backup installed, too, or else turn off the cache completely. If you are using LVM/software RAID you will also need to turn off the hard drives *own* write caches as well. And yes - you are going to take some serious performance hits from doing all this. You are trading performance for reliability in the face of power failures. And use ext4 instead of ext3 (ext4 adds journal checksumming) if you can.
Here is an article discussing making linux disk I/O safer: http://www.linux-mag.com/id/7773/
On Thu, 12 Aug 2010, Benjamin Franz wrote:
To: CentOS mailing list centos@centos.org From: Benjamin Franz jfranz@freerun.com Subject: Re: [CentOS] NTFS is more resilient than ext3? Or is it hardware issue?
On 08/12/2010 01:55 AM, Fajar Priyanto wrote:
Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
I would say 'luck'. No common system is normally 100% safe against 'pull the plug' shutdowns. Also, it matters how much disk I/O the system is doing. A system that is idle will tolerate 'pull the plug' better than one actually doing something. Additionally, powering up and powering down is the hardest thing you can do to the *hardware*. Servers should be let run 7/24 - they last longer. Finally, if power failures are taking the machine down, buy a UPS and connect the monitoring cable. I like APC UPSs and apcupsd for monitoring it and automatically shutting the system if needed.
I'm using an APC Back-UPS 650 on my home-built server. It does the job well. When there's a dip in the mains voltage the UPS switches in and keeps things running. I have configured apcupsd to gracefully shut the machine down after a 5 second power outage.
That APC UPS has been running for about 6 years now, still no problems with it.
I get postcards from APC occasionally, asking if I'd like to trade in my UPS for a newer one. Not now thankyou ;)
Kind Regards,
Keith Roberts
----------------------------------------------------------------- Websites: http://www.php-debuggers.net http://www.karsites.net http://www.raised-from-the-dead.org.uk
All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------
At Thu, 12 Aug 2010 16:55:29 +0800 CentOS mailing list centos@centos.org wrote:
Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
We (way back when while I was working at UMass) bought two Gateway desktop boxes (identical machines with identical Quantum SCSI disks). One got MS-Windows NT 4 installed on it, the other RedHat Linux. Within a month the RedHat box reported disk errors (nothing totally fatal, just bad sector I/O). We had the disk replaced with a Seagate SCSI disk and the machine was happy for years. Not a peep out of the NT box for like 7 months, then it basically died due to disk failure. We *suspected* that the disk probably was having trouble all along, but NT was totally 'oblivious' to the errors...
Thank you. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Thursday, August 12, 2010 04:55:29 am Fajar Priyanto wrote:
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is the Centos 5.5 box the same hardware that ran W2K? If not, then you can't really compare the systems.
Having said that, I have seen pull the plug blackouts on busy servers, NTFS and otherwise, lose data and have hard bad sectors.
The reason is that if the hard disk is in the process of writing a sector, and its power falls out from under it, especially if the 12 volts falls before the 5 volts, you can get scribbles on the disk. These scribbles, especially with newer drives that pack data tighter than older drives, can overwrite ordinarily protected servo data; when this happens you lose sectors and sometimes whole tracks of data. The right thing is to run a long SMART test (smartctl is the right tool, but read the man page before using it) and see how many sectors the drive ends up remapping. The remapped data is probably lost, but the drive should still be usable if not too many sectors got scribbled.
I had a pair of 250GB Maxtor Maxline II drives get scribbled thanks to a power supply that was losing one of its two 12 volt supply rails; 12 volts is in high demand in modern machines. Both drives now fail the SMART long test, even though all sectors except the 150 or so per drive that got scribbled on are ok. The drives have been in use for several years since the scribble incident, and no additional sectors have been remapped. But I did partition them so that the tracks I knew had seek error issues (thanks to the servo data getting overwritten) are between active partitions.
The two disks were in a Windows XP mirrored set; a large part of the NTFS filesystem was corrupted due to the particular location on the disks that got scribbled (both disks got marked as faulted as well).
When a disk scribbles in this manner you are going to get corruption of some sort; the amount and kind of corruption will depend entirely on what got scribbled.
You really need a UPS to prevent this, with the server having communication with the UPS to at least halt all writes when the power falls. Even if the 5 and 12 volt rails fall at the exact same time (impossible to design for, since the fall time will be determined by the RC time constant of the load of the output, and that is variable with system activity) during a disk write you could easily get problems. Some drives are more tolerant of this type of fault than others, but I've seen examples of drives from all the major brands have hard sector errors due to power supply issues; WD, Seagate, Maxtor, Toshiba, Hitachi, you name it.
I've seen it with all the major interface types, too, although enterprise class drives are far less likely to have the problem, but even then one of the more damaged drives I've seen was a Seagate Cheetah 72GB U320 SCSI drive, which ended up with over 2000 bad sectors after a particularly nasty set of power undervolts from a failing power supply in a Dell server (the undervoltage was on the 5 volt rail in this particular case; an oscilloscope trace of the 5V line looked like the Mediterranean costline); that drive ended up with sector 0 fried and all remaps taken, thus essentially a dead drive, even though the majority of it tests good. The test takes a very long time, though, thanks to all the seek errors the overwritten servo areas created.
So it is a hardware issue more than likely.