[CentOS] 11TB ext4 filesystem - filesystem alternatives?

Sat Sep 29 12:27:06 UTC 2012
Lamar Owen <lowen at pari.edu>

On Friday, September 28, 2012 04:29:55 PM Keith Keller wrote:
> No filesystem can fully protect against power failures--that's not its
> job.  That's why higher-end RAID controllers have battery backups, and
> why important servers should be on a UPS.  If you are really paranoid,
> you can probably tweak the kernel (e.g., using sysctl) to flush disk
> writes more frequently, but then you might drag down performance with
> it.

As far as UPS's are concerned, even those won't protect you from a BRS event.

BRS = Big Red Switch, aka EPO, or Emergency Power Off.  NEC Article 645 (IIRC) mandates this for Information Technology rooms that use the relaxed rules of that article (and virtually all IT rooms do so, in my experience).  The EPO is supposed to take *everything* down hard (including the DC to the UPS's, if the UPS is in the room, and shunt trip the breakers feeding the room so that the room is completely dead), and the fire suppression system is supposed to be tied in to it.  And the EPO has to be a push to activate, and it has to be accessible, and people have hit the switch before.

Caching controllers are only part of the equation; in a BRS event, the battery is likely to have let go of the cache contents by the time things are back up, depending upon what caused the BRS event.  This is a case where you should test this with a server and make see just how long the battery will hold the cache.

In the case of EMC Clariions, the write cache (there is only one, mirrored between the storage processors) on the storage processors is flushed to the 'vault' disks in an EPO event; there is a small UPS built in to the rack that keeps the vault disks up long enough to do this, and the SP's can then do an orderly shutdown.  Takes about 90 seconds with a medium sized write cache and fast vault drives.  Then, when the system boots back up, the vault contents are flushed out to the LUN's.

Now, to make this reliable, EMC has custom firmware loaded on their drives that doesn't do any write caching on the drive itself, and that is part of the design of their systems.  Drive enclosures (DAE, in EMC's terminology) other than the DAE with the OS and vault disks, can go down hard and the array won't lose data, thanks to the vault and the EMC software.  The EMC software periodically tests the battery backup units, and will disable the write cache (and flush it to disk) if the battery faults during the test.  It is amazing how much performance is due to good (and large) write caches; modern SATA drives owe much of their performance to their write caches.

No if the sprinkler system is what caused the EPO, well, it may not matter how good the write cache vault is, depending on how wet things get...... but that's part of the DR plan, or should be....