On Tue, Jan 6, 2015 at 6:12 PM, Les Mikesell <> wrote: > I've had a few systems with a lot of RAM and very busy filesystems > come up with filesystem errors that took a manual 'fsck -y' after what > should have been a clean reboot. This is particularly annoying on > remote systems where I have to talk someone else through the recovery. > > Is there some time limit on the cache write with a 'reboot' (no > options) command or is ext4 that fragile? I'd say there's no limit in the amount of time the kernel waits until the blocks have been written to disk; driven by there parameters: vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500 ie, if the data cached on RAM is older than 30s or larger than 10% available RAM, the kernel will try to flush it to disk. Depending how much data needs to be flushed at poweroff/reboot time, this could have a significant effect on the time taken. Regarding systems with lots of RAM, I've never seen such a behaviour on a few 192 GB RAM servers I administer. Granted, your system could be tuned in a different way or have some other configuration. TBH I'm not confident to give a definitive answer re the data not been totally flushed before reboot. I'd investigate: - Whether this happens on every reboot or just on some. - Whether your RAM is OK (the FS errors could come from that!). - Whether your disks/SAN are caching writes. (Maybe they are and the OS thinks the data has been flushed to disk, but they haven't) - filesystem mount options that might interfere (nobarrier, commit, data...) HTH ~f