Hi All,
I have an older CentOS 7.4 system that is used for computationally heavy work.
It has a 32G root filesystem, of which 33% is consumed.
Lately, one particular set of jobs (run through the SGE batch scheduler) seems
to cause a peculiar condition to occur in which the root filesystem space is
exhausted but I can't find any files that I can use to identify which process
is causing the problem. I'm guessing that the problem is being triggered by a
certain set of jobs since it only occurs when those jobs are running.
I would like to identify what the process is that is causing this condition to
occur and my standard approach is to identify which files are using the
otherwise empty space and then identify which process is using the files. It's
not working, I haven't been able to identify the files. They aren't there. All
attempts to measure disk usage of / by files shows that the disk usage is only
a percentage of available space and that there should be space available.
I know that space can be consumed by deleted files for which file handles
are still open, not visible by ls but detectable but by using techniques like:
lsof | grep deleted
and for sizing
find /proc/*/fd -ls 2>/dev/null | grep '(deleted)' | \
sed 's!.*\(/proc[^ ]*\).*!\1!' | xargs wc -c | sort -nr
Applying this technique during one of these episodes shows nothing of interest.
I'm definitely missing something. Are there any other techniques for
identifying other "invisible" files that I can look for?
Thanks in advance for any advice.