On Wed, Jan 7, 2015 at 1:37 PM, Gary Greene ggreene@minervanetworks.com wrote:
Problem is, Gordon, the layer I’m talking about is _below_ the logical layer that filesystems live at, in the block layer, at the mercy of drivers, and firmware that the kernel has zero control over. While in a perfect world, the controller would do strictly only what the Kernel tells it, that just isn’t true for a while now with the large caches that drives and controllers have now.
In most cases, this should never trigger, however in some buggy drivers, or controllers that have buggy firmware, the writes can be seriously delayed to disk, which can cause data to never make it to the platter.
I'd have to shut one down and get into the bios config to see, but I think these default to write-through if they aren't battery backed - caching may not even be an option. This one might have a battery going bad, though.
I see a bunch of entries like: ioatdma 0000:00:08.0: Channel halted, chanerr = 2 ioatdma 0000:00:08.0: Channel halted, chanerr = 0 in the logs and one of these: hrtimer: interrupt took 258633 ns
Not sure what those mean. We do have considerably more systems running windows than linux on this hardware and I don't think anyone has noticed a systemic problem there.