[CentOS] Question about optimal filesystem with many small files.

Sat Jul 11 08:33:43 UTC 2009
Alexander Georgiev <alexander.georgiev at gmail.com>

>
> Thanks, using directories as file names is a great idea, anyway I'm not sure if that would solve my performance issue, as the bottleneck is the disk and not mysql.

The situation you described initally, suffers from only one issue -
too many files in one single directory. You are not the fists fighting
this - see qmail maildir, see squid etc.  The remedy is always one and
the same - split the files into a tree folder structure. For a sample
implementaition - check out squid, backup pc etc ...

>I just implemented the directories names based on the hash of the file and the performance is a bit slower than before. This is the output of atop (15 secs. avg.):
>
> PRC | sys   0.53s | user   5.43s | #proc    112 | #zombie    0 | #exit      0 |
> CPU | sys      4% | user     54% | irq       2% | idle    208% | wait    131% |
> cpu | sys      1% | user     24% | irq       1% | idle     54% | cpu001 w 20% |
> cpu | sys      2% | user     15% | irq       1% | idle     31% | cpu002 w 52% |
> cpu | sys      1% | user      8% | irq       0% | idle     52% | cpu003 w 38% |
> cpu | sys      1% | user      7% | irq       0% | idle     71% | cpu000 w 21% |
> CPL | avg1  10.58 | avg5    6.92 | avg15   4.66 | csw    19112 | intr   19135 |
> MEM | tot    2.0G | free   49.8M | cache 157.4M | buff  116.8M | slab  122.7M |
> SWP | tot    1.9G | free    1.2G |              | vmcom   2.2G | vmlim   2.9G |

I am under the impression that you are swapping. Out of 2GB of cache,
you have just 157MB cache and 116MB buffers. What is eating the RAM?
Why do you have 0.8GB swap used? You need more memory for file system
cache.

> PAG | scan   1536 | stall      0 |              | swin       9 | swout      0 |
> DSK |         sdb | busy     91% | read     884 | write    524 | avio    6 ms |
> DSK |         sda | busy     12% | read     201 | write    340 | avio    2 ms |
> NET | transport   | tcpi    8551 | tcpo    8204 | udpi     702 | udpo     718 |
> NET | network     | ipi     9264 | ipo     8946 | ipfrw      0 | deliv   9264 |
> NET | eth0     5% | pcki    6859 | pcko    6541 | si 5526 Kbps | so  466 Kbps |
> NET | lo     ---- | pcki    2405 | pcko    2405 | si  397 Kbps | so  397 Kbps |
>
>
> in sdb is the cache and in sda is all other stuff, including the mysql db files. Check that I have a lot of disk reads in sdb, but I'm really getting one file from disk for each 10 written, so my guess is that all other reads are directory listings. As I'm using the hash as directory names, (I think) this makes the linux cache slower, as the files are distributed in a more homogeneous and randomly way among the directories.
>
I think that linux file system cache is smart enough for this type of load.
How many files per directory do you have?

> The app is running a bit slower than using the file name for directory name, although I expect (not really sure) that it will be better as the number of files on disk grows (currently there are only 600k files from 15M). My current performance is around 50 file i/o per second.
>

Something is wrong. Got to figure this out.  Where did this RAM go?