On Thursday 20 September 2007, Al Sparks wrote:
Why? What's different between NTFS and ext2/3 that defragging is needed in one but not the other? === Al
And this is the right question to ask...
Anyway - the answer about defragging, if you really care to understand it, is pretty length.
FAT used to be horrible. It would always simply take the first available cluster and use that to store data. This resulted in a lot of fragmentation.
NTFS is much better already. It still profits from defragging but the results don't make that much of a difference anymore as long as your partition doesn't get close to being full. It tries to allocate contiguous blocks and will even add some buffer to the end for file growth.
ext2/3 is similar to ntfs in its fragmentation resistance. It however, has 2 more advantages. First, linux uses swap devices and stuff like mmapped files are still movable. In windows, swap files and some other files are not movable. The second advantage is reserved space. By default, each ext2/3 filesystem has 5% of its space reserved for root. ext2/3 simply assume you will never get past 95% full - so the data is laid out accordingly. Since you know you have at least 5% free disk blocks, you can leave a little bit more unallocated space at the end of each file... Its not much but it adds up over time.
The worst possible scenario I've found for ext3 so far is cvs. With every checkin, cvs has to modify the whole file. It does so by writing a completely new file, then deleting the old one and moving the new file in place. This means that each time, the filesystem has to allocate new space.
For a long time, I balanced stuff between servers, removed outdated code and so on. bi-monthly fsck would show about 1-2% fragmentation at about 75% filesystem full. Then a few large projects were imported. filesystem usage went up to 98% (someone did a tune2fs -m 0) and then the problems really started. I'm just about to go home now - 2am. I spent the last few hours reorganizing the cvs filesystem. A filesystem check showed 61% fragmentation! I moved old code off to a secondary server, then coppied things off, recreated the filesystem and then copied the data back.
Results were impressive - my I/O subsystem can take about 1800 io ops per second. The result before that, was about 1.1MB/sec throughput measured in iostat with a few cvs processes running at the same time. After the reorg... again 1800 ios - but my throughput rose to a more useful 24 MB/sec...
Anyway - bullet points: * there is no good way to measure (on a filesystem level) fragmentation other than fsck * try filefrag to check for fragmentation on a per file basis. * there is no online ext2/3 defragger that works on block level * there is a offline defragger for ext2 on block level e2defrag. ext3 would have to be converted to ext2 and back to ext3 after the defrag. * there are some filelevel fragmentation tools. They basically work by copying files around. This works on filesystems that had high utilization for a while, got fragmented but are now mostly empty again. I tried some of that on my cvs server but none ended up giving me good results. * if fsck shows high fragmentation (>5% in my opinions) you should make sure the filesystem doesn't get that full and if you really want to defrag, copy the data off and back on. Its the best way to do it.
And now I'm off to bed.
Peter.