[CentOS] Disk Elevator

Aleksandar Milivojevic alex at milivojevic.org
Mon Jan 8 19:50:07 UTC 2007


Quoting "Ross S. W. Walker" <rwalker at medallion.com>:

> BTW that parity chunk still needs to be in memory to avoid the read on
> it, no? In that case wouldn't a stride of 64 help in that case? And if
> the stride leaves out the parity chunk then will not successive
> read-aheads cause a continuous wrap of the stripe which will negate the
> effect of the stride by not having the complete stripe cached?

Hm, not really.  The parity chunk is never handed over to the OS.   
It's internal to the hardware RAID controller.  OS doesn't know  
anything about it, it doesn't even know that the "disk" it is  
accessing is actually RAID5 array.

Back to your example of 4 disk RAID5, 64k chunks, 4k file system blocks.

If you set stride to 48, OS gives 3 chunks worth of data to the  
controller, aligned with stripes.  Controller calculates parity and  
writes out 4 chunks (3 data, 1 parity).

If you set stride to 64, OS gives 4 chunks worth of data to the  
controller.  In best case scenario first or last three will be aligned  
with stripes.  Controller calculates parity on 3 of them, writes out 4  
chunks (3 data, 1 parity).  For the remaining data chunk, it needs to  
read 2 chunks from the disk, calculates parity and writes 2 chunks (1  
data, 1 parity).  In worst case scenario first or last 3 chunks will  
not be aligned with stripes.  Controller reads 1 chunk, calculates  
parity writes out 3 chunks (2 data, 1 parity), than does the same  
thing again for remaining 2 chunks of data.

Anyhow, for large sequential reads and writes there's really not a big  
performace benefit (if any).  OS will tend to combine and rearrange  
reads and writes to be sequential, and the hardware RAID controller  
will do the same using its cache.  I've tested this once with good  
RAID controller, and bonnie++ (which benchmarks this kind of access)  
gave almost the same numbers with and without using stride option.

If disk access is random, read block here, write block there, there  
might be some benefit (however, cache in hardware RAID controller  
might kick in and save the day here too).  It all depends on  
particular RAID contoller, workload and amount and type (write back  
vs. write through)  of cache on the controller.

I'd say in most cases using stride option has very little effect if  
you have a large battery backed up write back cache (and good RAID  
controller, that is).  If you are using software RAID, or have small  
and/or write through cache, stride option might have some effects.





More information about the CentOS mailing list