On Jan 6, 2015, at 4:28 PM, Fran Garcia franchu.garcia@gmail.com wrote:
On Tue, Jan 6, 2015 at 6:12 PM, Les Mikesell <> wrote:
I've had a few systems with a lot of RAM and very busy filesystems come up with filesystem errors that took a manual 'fsck -y' after what should have been a clean reboot. This is particularly annoying on remote systems where I have to talk someone else through the recovery.
Is there some time limit on the cache write with a 'reboot' (no options) command or is ext4 that fragile?
I'd say there's no limit in the amount of time the kernel waits until the blocks have been written to disk; driven by there parameters:
vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500
ie, if the data cached on RAM is older than 30s or larger than 10% available RAM, the kernel will try to flush it to disk. Depending how much data needs to be flushed at poweroff/reboot time, this could have a significant effect on the time taken.
Regarding systems with lots of RAM, I've never seen such a behaviour on a few 192 GB RAM servers I administer. Granted, your system could be tuned in a different way or have some other configuration.
TBH I'm not confident to give a definitive answer re the data not been totally flushed before reboot. I'd investigate:
- Whether this happens on every reboot or just on some.
- Whether your RAM is OK (the FS errors could come from that!).
- Whether your disks/SAN are caching writes. (Maybe they are and the
OS thinks the data has been flushed to disk, but they haven't)
- filesystem mount options that might interfere (nobarrier, commit, data...)
This has been discussed to death on various lists, including the LKML...
Almost every controller and drive out there now lies about what is and isn’t flushed to disk, making it nigh on impossible for the Kernel to reliably know 100% of the time that the data HAS been flushed to disk. This is part of the reason why it is always a Good Idea™ to have some sort of pause in the shut down to ensure that it IS flushed.
This is also why server grade gear uses battery backed buffers, etc. which are supposed to allow drives to properly flush the data to disk. There is still a slim chance in these cases that the data still will not reach the platter before power off or reboot, especially in catastrophic cases.
-- Gary L. Greene, Jr. Sr. Systems Administrator IT Operations Minerva Networks, Inc. Cell: +1 (650) 704-6633