[CentOS] NTFS is more resilient than ext3? Or is it hardware issue?

Thu Aug 12 13:57:03 UTC 2010
Lamar Owen <lowen at pari.edu>

On Thursday, August 12, 2010 04:55:29 am Fajar Priyanto wrote:
> Back long time ago, we have an old file MS W2K (NTFS) server where due
> no admin was available to manage it, the server would get power off
> when the office closed, and auto power on again in the morning. That
> thing happened for years and it was fine ^^

> Recently, I setup a Centos 5.5 file server with ext3 and got power
> blackout twice and I notice the filesystem got corrupted and also bad
> sectors.

Is the Centos 5.5 box the same hardware that ran W2K?  If not, then you can't really compare the systems.

Having said that, I have seen pull the plug blackouts on busy servers, NTFS and otherwise, lose data and have hard bad sectors.

The reason is that if the hard disk is in the process of writing a sector, and its power falls out from under it, especially if the 12 volts falls before the 5 volts, you can get scribbles on the disk.  These scribbles, especially with newer drives that pack data tighter than older drives, can overwrite ordinarily protected servo data; when this happens you lose sectors and sometimes whole tracks of data.  The right thing is to run a long SMART test (smartctl is the right tool, but read the man page before using it) and see how many sectors the drive ends up remapping.  The remapped data is probably lost, but the drive should still be usable if not too many sectors got scribbled.

I had a pair of 250GB Maxtor Maxline II drives get scribbled thanks to a power supply that was losing one of its two 12 volt supply rails; 12 volts is in high demand in modern machines.  Both drives now fail the SMART long test, even though all sectors except the 150 or so per drive that got scribbled on are ok.  The drives have been in use for several years since the scribble incident, and no additional sectors have been remapped.  But I did partition them so that the tracks I knew had seek error issues (thanks to the servo data getting overwritten) are between active partitions.

The two disks were in a Windows XP mirrored set; a large part of the NTFS filesystem was corrupted due to the particular location on the disks that got scribbled (both disks got marked as faulted as well).

When a disk scribbles in this manner you are going to get corruption of some sort; the amount and kind of corruption will depend entirely on what got scribbled.

You really need a UPS to prevent this, with the server having communication with the UPS to at least halt all writes when the power falls. Even if the 5 and 12 volt rails fall at the exact same time (impossible to design for, since the fall time will be determined by the RC time constant of the load of the output, and that is variable with system activity) during a disk write you could easily get problems.  Some drives are more tolerant of this type of fault than others, but I've seen examples of drives from all the major brands have hard sector errors due to power supply issues; WD, Seagate, Maxtor, Toshiba, Hitachi, you name it.

I've seen it with all the major interface types, too, although enterprise class drives are far less likely to have the problem, but even then one of the more damaged drives I've seen was a Seagate Cheetah 72GB U320 SCSI drive, which ended up with over 2000 bad sectors after a particularly nasty set of power undervolts from a failing power supply in a Dell server (the undervoltage was on the 5 volt rail in this particular case; an oscilloscope trace of the 5V line looked like the Mediterranean costline); that drive ended up with sector 0 fried and all remaps taken, thus essentially a dead drive, even though the majority of it tests good.  The test takes a very long time, though, thanks to all the seek errors the overwritten servo areas created.

So it is a hardware issue more than likely.