we're having a weird disk I/O problem on a 5.4 server connected to an external SAS storage with an LSI logic megaraid sas 1078.
The server is used as a samba file server.
Every time we try to copy some large file to the storage-based file system, the disk utilization see-saws up to 100% to several seconds of inactivity, to climb up again to 100% and so forth. Here are a snip from the iostat -kx 1:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdb1 0.00 133811.00 0.00 1889.00 0.00 513660.00 543.84 126.24 65.00 0.47 89.40 sdb1 0.00 138.61 0.00 109.90 0.00 29845.54 543.14 2.54 54.32 0.37 4.06 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 134680.00 0.00 1920.00 0.00 526524.00 548.46 126.06 64.57 0.47 90.00 sdb1 0.00 142.00 0.00 74.00 0.00 20740.00 560.54 1.25 45.14 0.47 3.50 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 1.00 0.00 4.00 0.00 8.00 0.01 14.00 14.00 1.40 sdb1 0.00 116129.00 1.00 1576.00 4.00 434816.00 551.45 125.47 75.38 0.57 90.30 sdb1 0.00 17301.98 0.00 412.87 0.00 106506.93 515.93 24.59 75.40 0.48 19.80 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
It happens when I copy a file over the net using samba or when copying/creating a local file
It looks like the disk tries to get more than it can handle, then it chokes with data and stales for a few seconds until some buffer empties and it's able to get a bit more data again.
It happens in two identical servers, so I'd discard faulty hardware as the cause and look into a miscofiguration issue.
Are there any guidelines/docs for heavy I/O tuning? are there any issues with this raid controller?
any help will be apreciated
Fer
Fernando Gleiser wrote:
we're having a weird disk I/O problem on a 5.4 server connected to an external SAS storage with an LSI logic megaraid sas 1078.
Not sure I know what the issue is but telling us how many disks, what the RPM of the disks are, and what level of RAID would probably help.
It sounds like perhaps you have a bunch of 7200RPM disks in a RAID setup where the data:parity ratio may be way out of whack(e.g. high number of data disks to parity disks), which will result in very poor write performance.
nate
----- Original Message ----
From: nate centos@linuxpowered.net
Not sure I know what the issue is but telling us how many disks, what the RPM of the disks are, and what level of RAID would probably help.
It sounds like perhaps you have a bunch of 7200RPM disks in a RAID setup where the data:parity ratio may be way out of whack(e.g. high number of data disks to parity disks), which will result in very poor write performance.
yes, ita bunch of 12 7k2 RPM disks organized as 1 hot spare, 2 parity disks, 9 data disks in a RAID 5 configuration. is 9/2 a "high ratio"?
Thanks for your help
Fer
Fernando Gleiser wrote:
yes, ita bunch of 12 7k2 RPM disks organized as 1 hot spare, 2 parity disks, 9 data disks in a RAID 5 configuration. is 9/2 a "high ratio"?
Perhaps RAID 6, as I've never heard of RAID 5 with two parity (two parity is dual parity which is RAID 6).
RAID 6 performance can vary dramatically between controllers, if it were me unless you get any other responses shortly I would test other RAID configurations and see how the performance compares
RAID 1+0 RAID 5+0 (striped RAID 5 arrays, in your case perhaps 3+1 * 4 w/no hot spares? at least for testing) RAID 5+0 (5+1 * 2)
RAID 1+0 should be first though, even if you don't end up using it in the end, it's good to get a baseline with the fastest configuration.
I would expect the RAID card to support RAID 50, but not all do, if it doesn't one option may be to perform striping using LVM at the OS level.
nate
nate wrote:
Perhaps RAID 6, as I've never heard of RAID 5 with two parity (two parity is dual parity which is RAID 6).
Forgot to mention my own personal preference on my high end SAN at least is for RAID 5 with a 3:1 parity ratio, or a max of 5:1 or 6:1, really never higher than that unless activity is very low.
The RAID controllers on my array are the fastest in the industry, and despite that, in the near future I am migrating to a parity ratio of 2:1 to get (even)better performance, that brings me to within about 3-4% of RAID 1+0 performance for typical workloads.
nate
On Tue, Feb 9, 2010 at 4:33 PM, Fernando Gleiser fergleiser@yahoo.com wrote:
----- Original Message ----
From: nate centos@linuxpowered.net
Not sure I know what the issue is but telling us how many disks, what the RPM of the disks are, and what level of RAID would probably help.
It sounds like perhaps you have a bunch of 7200RPM disks in a RAID setup where the data:parity ratio may be way out of whack(e.g. high number of data disks to parity disks), which will result in very poor write performance.
yes, ita bunch of 12 7k2 RPM disks organized as 1 hot spare, 2 parity disks, 9 data disks in a RAID 5 configuration. is 9/2 a "high ratio"?
A bit. Your RAID array is configured for a read-mostly configuration.
Here is a simple rule, given you have a HW RAID controller with write-back cache, assume each write will span the whole stripe width (controller tries to cache full stripe writes), if that is the case then the write IOPS will be equal to the IOPS of your slowest disk within the set as the next write can't go until the first write has finished.
Of course with RAID5/RAID6 the write performance can be much, much worse if the write is short of the whole stripe width as it will then have to read the remaining stripe set (in order to caclulate parity), then write the whole stripe set out. It sounds like your data is sequential though so this shouldn't happen much, maybe the first or last stripes, so using the above simple rule is a good guide.
For software RAID5/RAID6 that doesn't have a write-cache to cache the stripe-width, make sure the file system knows the stripe width and hope it does the write thing :)
-Ross
On 2010-02-09 18:15, Fernando Gleiser wrote:
Every time we try to copy some large file to the storage-based file system, the disk utilization see-saws up to 100% to several seconds of inactivity, to climb up again to 100% and so forth. Here are a snip from the iostat -kx 1:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdb1 0.00 133811.00 0.00 1889.00 0.00 513660.00 543.84 126.24 65.00 0.47 89.40
The iostat output looks good to me for the RAID setup you have. I'd look for the problem in a different place:
note the output of cat /proc/sys/vm/dirty_background_ratio and try echo 1 > /proc/sys/vm/dirty_background_ratio whether it helps.
Andrzej
On Feb 11, 2010, at 2:46 AM, Andrzej Szymanski szymans@agh.edu.pl wrote:
On 2010-02-09 18:15, Fernando Gleiser wrote:
Every time we try to copy some large file to the storage-based file system, the disk utilization see-saws up to 100% to several seconds of inactivity, to climb up again to 100% and so forth. Here are a snip from the iostat -kx 1:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- sz avgqu-sz await svctm %util sdb1 0.00 133811.00 0.00 1889.00 0.00 513660.00 543.84 126.24 65.00 0.47 89.40
The iostat output looks good to me for the RAID setup you have. I'd look for the problem in a different place:
note the output of cat /proc/sys/vm/dirty_background_ratio and try echo 1 > /proc/sys/vm/dirty_background_ratio whether it helps.
Excellent suggestion, on machines with lots of memory the default dirty background ratio is way too big, and needs to be tuned down for both data integrity in the event of a system failure and performance of the underlying storage configuration.
Take into account the RAID setup, write-back cache size and time it takes to empty it to disk and pick a dirty background ratio somewhere in between.
-Ross
----- Original Message ----
From: Ross Walker rswwalker@gmail.com To: CentOS mailing list centos@centos.org Cc: CentOS mailing list centos@centos.org Sent: Thu, February 11, 2010 12:30:43 PM Subject: Re: [CentOS] disk I/O problems with LSI Logic RAID controller
On Feb 11, 2010, at 2:46 AM, Andrzej Szymanski wrote:
The iostat output looks good to me for the RAID setup you have. I'd look for the problem in a different place:
note the output of cat /proc/sys/vm/dirty_background_ratio and try echo 1 > /proc/sys/vm/dirty_background_ratio whether it helps.
Excellent suggestion, on machines with lots of memory the default dirty background ratio is way too big, and needs to be tuned down for both data integrity in the event of a system failure and performance of the underlying storage configuration.
Take into account the RAID setup, write-back cache size and time it takes to empty it to disk and pick a dirty background ratio somewhere in between.
You nailed it. I tweaked the dirty_background ratio and changed the scheduller to deadline and now it works way better. it still see-saws a bit but the utilization never dropts to zero.
Thanks you for your help.
Fer