On Mar 10, 2011, at 6:33 PM, Ross Walker rswwalker@gmail.com wrote:
On Mar 10, 2011, at 3:49 PM, John R Pierce pierce@hogranch.com wrote:
On 03/10/11 12:40 PM, Les Mikesell wrote:
I thought there were also problems in layers like lvm that keep the OS from knowing exactly what happened. And a lot of software that should fsync at certain points probably doesn't because linux has historically handled it badly.
thats another problem entirely. both the MD and LVM layers of linux tend to drop write barriers which are supposed to ensure that key writes occur in the correct order. this is one reason we tend to run our mission critical database servers on Solaris or AIX rather than Linux.
I think LVM respecting barriers is in RHEL6.
The lack of barrier support is mitigated by the battery backed write-back cache, as far as volatility is concerned, though barriers also preserve ordering which BBWBC doesn't guarantee, though advanced RAID controllers should support FUA (forced unit access) which allows properly written scsi subsystems to preserve ordering. An FUA will make sure all pending data is flushed to disk, then the data that the FUA covers is written direct to disk.
The barrier support was revised recently to only support FUA devices I believe because non-FUA based devices were too expensive (performance wise) to cludge barrier support for, so if your device doesn't do FUA then it's barriers are basically a no-op.
Let me correct myself that the drives need to support 'sync', FUA is a nice optional as it negates the need for sync-write-sync, but still for cheap drives that don't respect 'sync' it's a no-op where before it use to do a drain-stop (painfully slow).
-Ross