[CentOS] disk I/O problems with LSI Logic RAID controller

Tue Feb 9 17:15:26 UTC 2010
Fernando Gleiser <fergleiser at yahoo.com>

we're having a weird disk I/O problem on a 5.4 server connected to an external SAS storage with an LSI logic megaraid sas 1078.

The server is used as a samba file server.

Every time we try to copy some large file to the storage-based file system, the disk utilization see-saws up to 100% to several seconds of inactivity, to climb up again to 100% and so forth.
Here are a snip from the iostat -kx 1:

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb1              0.00 133811.00  0.00 1889.00     0.00 513660.00   543.84   126.24   65.00   0.47  89.40
sdb1              0.00   138.61  0.00 109.90     0.00 29845.54   543.14     2.54   54.32   0.37   4.06
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00 134680.00  0.00 1920.00     0.00 526524.00   548.46   126.06   64.57   0.47  90.00
sdb1              0.00   142.00  0.00 74.00     0.00 20740.00   560.54     1.25   45.14   0.47   3.50
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00     0.00  1.00  0.00     4.00     0.00     8.00     0.01   14.00  14.00   1.40
sdb1              0.00 116129.00  1.00 1576.00     4.00 434816.00   551.45   125.47   75.38   0.57  90.30
sdb1              0.00 17301.98  0.00 412.87     0.00 106506.93   515.93    24.59   75.40   0.48  19.80
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00


It happens when I copy a file over the net using samba or when copying/creating a local file

It looks like the disk tries to get more than it can handle, then it chokes with data and stales for a few seconds until some buffer empties and it's able to get a bit more data again.

It happens in two identical servers, so I'd discard faulty hardware as the cause and look into a miscofiguration issue.

Are there any guidelines/docs for heavy I/O tuning? are there any issues with this raid controller?

any help will be apreciated



Fer