[CentOS] Disk usage for small files in ext3 in CentOS 5

Wed Mar 11 22:58:05 UTC 2009
William L. Maltby <CentOS4Bill at triad.rr.com>

On Wed, 2009-03-11 at 17:29 -0400, Filipe Brandenburger wrote:
> Hello,
> 
> I noticed something unusual today.
> 
> If I "du" a small file (couple of bytes) in CentOS 5, it tells me the
> file is using 8kb, while I was expecting 4kb which is the block size
> I'm using.
> 
> I tried this on several CentOS 5 machines, both x86_64 and i386:
> 
> $ echo test >test.txt
> $ ls -l test.txt
> -rw-rw-r-- 1 filbranden filbranden 5 Mar 11 17:24 test.txt
> $ du -h test.txt
> 8.0K	test.txt
> 
> If I do the same on a CentOS 4 machine:
> 
> $ echo test >test.txt
> $ ls -l test.txt
> -rw-rw-r--  1 filbranden filbranden 5 Mar 11 17:25 test.txt
> $ du -h test.txt
> 4.0K	test.txt
> 
> On all machines I tested, both CentOS 4 and CentOS 5:
> 
> # tune2fs -l /dev/xxxxx
> ...
> Block size:               4096
> Fragment size:            4096
> 
> I could not find any differences that would explain the behaviour.
> Have you seen this before? Can you reproduce it on your systems? Do
> you know how to get the CentOS 4 behaviour?
> 
> More on the point: I'm migrating some data from CentOS 4 to CentOS 5,
> it's around 70GB of millions of small files. I would like it to still
> take 70GB, not 140GB. For now, I'm working around this issue by using
> "-T small" to mke2fs, I'm not sure if it's going to have the effect I
> want, and I'm not sure about any other impact (performance?) it might
> have on my filesystem.

I'm a gambler, so I'll bet on this. Very large disks? If so, it may be
that some of the tunables specify two blocks per "fragment" or the
bytes-per-inode specifies more than 4K. I've been able, in the past, to
affect things like this by tuning the number of i-nodes up/down when
making the file system. Generally though, I'm reducing the number as
there is a lot of space that can be gained since normally there will be
1 per block, IIRC. Since my desktop FS doesn't experience that much
growth, and lots of the files are large, this is safe. YMMV.

The output of the tune2fs command might give some hints.

Also, using mke2fs with the "-n" parameter will tell you what it would
do if you were to (re) make the file system.

> <snip sig stuff>

HTH
-- 
Bill