[CentOS] NTFS is more resilient than ext3? Or is it hardware issue?

Thu Aug 12 12:55:29 UTC 2010
Benjamin Franz <jfranz at freerun.com>

On 08/12/2010 01:55 AM, Fajar Priyanto wrote:
> Hi guys,
> I don't mean to incite debate or something, just want to share
> experience and a little curiosity.
>
> Back long time ago, we have an old file MS W2K (NTFS) server where due
> no admin was available to manage it, the server would get power off
> when the office closed, and auto power on again in the morning. That
> thing happened for years and it was fine ^^
>
> Recently, I setup a Centos 5.5 file server with ext3 and got power
> blackout twice and I notice the filesystem got corrupted and also bad
> sectors.
>
> Is it just pure random luck, software or hardware issue?
> What's your experience?
>    

I would say 'luck'. No common system is normally 100% safe against 'pull 
the plug' shutdowns. Also, it matters how much disk I/O the system is 
doing. A system that is idle will tolerate 'pull the plug' better than 
one actually doing something. Additionally, powering up and powering 
down is the hardest thing you can do to the *hardware*. Servers should 
be let run 7/24 - they last longer. Finally, if power failures are 
taking the machine down, buy a UPS and connect the monitoring cable. I 
like APC UPSs and apcupsd for monitoring it and automatically shutting 
the system if needed.

You can improve ext3's resistance to corruption quite a bit if you use 
the 'journal=data,barrier=1' mount options. Barriers is actually one of 
the few cases where software RAID or LVM hurts you - they don't honor 
barriers (at least not in CentOS/RHEL - newer kernels have improved this 
somewhat). If you are using a hardware RAID card with onboard cache - 
make **SURE** it has battery backup installed, too, or else turn off the 
cache completely. If you are using LVM/software RAID you will also need 
to turn off the hard drives *own* write caches as well.  And yes - you 
are going to take some serious performance hits from doing all this. You 
are trading performance for reliability in the face of power failures. 
And use ext4 instead of ext3 (ext4 adds journal checksumming) if you can.

Here is an article discussing making linux disk I/O safer: 
http://www.linux-mag.com/id/7773/

-- 
Benjamin Franz