On 08/12/2010 01:55 AM, Fajar Priyanto wrote:
Hi guys, I don't mean to incite debate or something, just want to share experience and a little curiosity.
Back long time ago, we have an old file MS W2K (NTFS) server where due no admin was available to manage it, the server would get power off when the office closed, and auto power on again in the morning. That thing happened for years and it was fine ^^
Recently, I setup a Centos 5.5 file server with ext3 and got power blackout twice and I notice the filesystem got corrupted and also bad sectors.
Is it just pure random luck, software or hardware issue? What's your experience?
I would say 'luck'. No common system is normally 100% safe against 'pull the plug' shutdowns. Also, it matters how much disk I/O the system is doing. A system that is idle will tolerate 'pull the plug' better than one actually doing something. Additionally, powering up and powering down is the hardest thing you can do to the *hardware*. Servers should be let run 7/24 - they last longer. Finally, if power failures are taking the machine down, buy a UPS and connect the monitoring cable. I like APC UPSs and apcupsd for monitoring it and automatically shutting the system if needed.
You can improve ext3's resistance to corruption quite a bit if you use the 'journal=data,barrier=1' mount options. Barriers is actually one of the few cases where software RAID or LVM hurts you - they don't honor barriers (at least not in CentOS/RHEL - newer kernels have improved this somewhat). If you are using a hardware RAID card with onboard cache - make **SURE** it has battery backup installed, too, or else turn off the cache completely. If you are using LVM/software RAID you will also need to turn off the hard drives *own* write caches as well. And yes - you are going to take some serious performance hits from doing all this. You are trading performance for reliability in the face of power failures. And use ext4 instead of ext3 (ext4 adds journal checksumming) if you can.
Here is an article discussing making linux disk I/O safer: http://www.linux-mag.com/id/7773/