[CentOS] reboot - is there a timeout on filesystem flush?

Wed Jan 7 00:28:26 UTC 2015
Fran Garcia <franchu.garcia at gmail.com>

On Tue, Jan 6, 2015 at 6:12 PM, Les Mikesell <> wrote:
> I've had a few systems with a lot of RAM and very busy filesystems
> come up with filesystem errors that took a manual 'fsck -y' after what
> should have been a clean reboot.  This is particularly annoying on
> remote systems where I have to talk someone else through the recovery.
>
> Is there some time limit on the cache write with a 'reboot' (no
> options) command or is ext4 that fragile?

I'd say there's no limit in the amount of  time the kernel waits until
the blocks have been written to disk; driven by there parameters:

vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500

ie, if the data cached on RAM is older than 30s or larger than 10%
available RAM, the kernel will try to flush it to disk. Depending how
much data needs to be flushed at poweroff/reboot time, this could have
a significant effect on the time taken.

Regarding systems with lots of RAM, I've never seen such a behaviour
on a few 192 GB RAM servers I administer. Granted, your system could
be tuned in a different way or have some other configuration.

TBH I'm not confident to give a definitive answer re the data not been
totally flushed before reboot. I'd investigate:

- Whether this happens on every reboot or just on some.
- Whether your RAM is OK (the FS errors could come from that!).
- Whether your disks/SAN are caching writes.  (Maybe they are and the
OS thinks the data has been flushed to disk, but they haven't)
- filesystem mount options that might interfere  (nobarrier, commit, data...)


HTH

~f