[CentOS] Question about optimal filesystem with many small files.

Sat Jul 11 07:55:50 UTC 2009
oooooooooooo ooooooooooooo <hhh735 at hotmail.com>

Thanks, using directories as file names is a great idea, anyway I'm not sure if that would solve my performance issue, as the bottleneck is the disk and not mysql. I just implemented the directories names based on the hash of the file and the performance is a bit slower than before. This is the output of atop (15 secs. avg.):

PRC | sys   0.53s | user   5.43s | #proc    112 | #zombie    0 | #exit      0 |
CPU | sys      4% | user     54% | irq       2% | idle    208% | wait    131% |
cpu | sys      1% | user     24% | irq       1% | idle     54% | cpu001 w 20% |
cpu | sys      2% | user     15% | irq       1% | idle     31% | cpu002 w 52% |
cpu | sys      1% | user      8% | irq       0% | idle     52% | cpu003 w 38% |
cpu | sys      1% | user      7% | irq       0% | idle     71% | cpu000 w 21% |
CPL | avg1  10.58 | avg5    6.92 | avg15   4.66 | csw    19112 | intr   19135 |
MEM | tot    2.0G | free   49.8M | cache 157.4M | buff  116.8M | slab  122.7M |
SWP | tot    1.9G | free    1.2G |              | vmcom   2.2G | vmlim   2.9G |
PAG | scan   1536 | stall      0 |              | swin       9 | swout      0 |
DSK |         sdb | busy     91% | read     884 | write    524 | avio    6 ms |
DSK |         sda | busy     12% | read     201 | write    340 | avio    2 ms |
NET | transport   | tcpi    8551 | tcpo    8204 | udpi     702 | udpo     718 |
NET | network     | ipi     9264 | ipo     8946 | ipfrw      0 | deliv   9264 |
NET | eth0     5% | pcki    6859 | pcko    6541 | si 5526 Kbps | so  466 Kbps |
NET | lo     ---- | pcki    2405 | pcko    2405 | si  397 Kbps | so  397 Kbps |


in sdb is the cache and in sda is all other stuff, including the mysql db files. Check that I have a lot of disk reads in sdb, but I'm really getting one file from disk for each 10 written, so my guess is that all other reads are directory listings. As I'm using the hash as directory names, (I think) this makes the linux cache slower, as the files are distributed in a more homogeneous and randomly way among the directories. 

The app is running a bit slower than using the file name for directory name, although I expect (not really sure) that it will be better as the number of files on disk grows (currently there are only 600k files from 15M). My current performance is around 50 file i/o per second.




_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx