Hi,
On Thu, Sep 24, 2009 at 21:18, Philip Gwyn liste@artware.qc.ca wrote:
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Questions :
- Anyone have a clue or other on how to track down my bottle neck?
You can use the command "iostat -kx 1 /dev/sd?" which will give you more information of what is happening, in particular it will show %util which will show how often the drive is busy, and you can correlate that with the rkB/s and wkB/s to see how much data is being read or written to that specific drive. You also have averages for the request size (to know if you have many small operations or a few big ones), queue size, service time and wait time. See "man iostat" for more details. It's not installed by default on CentOS 5 but it's available from the base repositories, just run "yum install sysstat" if you don't have it yet.
If you are using RAID-5 you might want to see if the chunk size you are using is good. You can specify that when you create a new array using the "-c" option to mdadm. I don't think you can change that after it's created. The default is 64kB which sounds sane enough but you might want to check if yours was created with that value or not.
The problem is basically if you have big operations that are larger than the chunk size it will require operations on all the disks which means all of them will have to seek to a specific position to complete your operation, and while they are doing that they will not be able to work on any other requests. If you have high usage and random access the disks will spend a lot of time seeking. If that is the case, you might want to increase the chunk size so that most operations can be fulfilled by one disk only so that the others are free to work on other requests at that time.
On the other hand, if you have specific areas of your filesystem that are hit more often that fall always on the same disk, that disk will be used more than the other ones, so your performance will be effectively limited by that one disk instead of multiplied by the number of disks due to the striped access. In that case it might make sense to reduce the chunk size in order to make the access more even across disks. I read sometime ago that ext2/ext3 has a way of allocating blocks that will create such unfair distribution when you are striping across a certain number of disks, I don't know exactly how that works but you might want to check into that. I remember that when you create the ext2/ext3 filesystem you can use an option such as "stride=..." to give a hint on the disk layout so that the filesystem can disalign those blocks enough to spread the load across the disks. But I remember I could never exactly figure out what "stride=..." number would make sense to me... the documentation is kind of scarce in this area, but check the mke2fs manpage anyway if you have a disk that is more "hot" than the others and you think that might be the problem. You can also experiment with other filesystems such as XFS which is available in the extras repository.
And of course, make sure "cat /proc/mdstat" shows everything OK, make sure you aren't running a degraded array before you start investigating its performance.
I'm sure there are performance tunings that can be done with, e.g., hdparm, tweaking numbers in /proc and /sys filesystems, or changing the kernel scheduler, but I'm not really experienced with that so I couldn't really advise you on that. I'm sure others will have such experience and will be able to give you pointers on that. You might want to ask on the main list in that case, instead of the -virt one.
HTH, Filipe