On Tue, Jul 17, 2012 at 8:27 AM, m.roth@5-cent.us wrote:
I always wondered why the default for nfs was ever sync in the first
place. Why shouldn't it be the same as local use of the filesystem? The few things that care should be doing fsync's at the right places anyway.
Well, the reason would be that LOCAL operations happen at speeds that are massively smaller (by factors of hundreds or thousands of times) than do operations that take place via NFS on a normal network.
I would also think that, historically speaking, networks used to be noisier, and more prone to dropping things on the floor (watch out for the bitrot in the carpet, all those bits get into it, y'know...), and so it was for reliability of data.
How many apps really expect the status of every write() to mean they have a recoverable checkpoint?
What I mean is that nobody ever uses sync operations locally - writes are always buffered unless the app does an fsync, and data will sit in that buffer much longer that it does on the network.
But unless the system goes down, that data *will* get written.
But the thing with the spinning disks is the thing that will go down. Not much reason for a network to break - at least since people stopped using thin coax.
As I said in what I think was my previous post on this subject, I do have concerns about data security when it might be the o/p of a job that's been running for days.
It is a rare application that can recover (or expects to) without losing any data from a random disk write. In fact it would be a foolish application that expects that, since it isn't guaranteed to be committed to disk locally without an fsync. Maybe things like link and rename that applications use as atomic checkpoints in the file system need it. These days wouldn't it be better to use one of the naturally-distributed and redundant databases (riak, cassandra, mongo, etc.) for big jobs instead of nfs filesystems anyway?