[CentOS] Help - iSCSI and SAMBA?

Mon Jan 30 19:59:43 UTC 2006
Scott Sharkey <ssharkey at linuxunlimited.com>

Hi All,

I have a client trying to us a Promise Tech iSCSI array to share 2.8TB 
via SAMBA.  I have CentOS 4.2 with all updates installed on an IBM 
server.  The installation and setup was pretty straightforward.  The 
Promise box is using Gigabit Ethernet, and is the only device on that 
net (I think they are using a cross-over cable - I didn't set up the
hardware).  We're experiencing periodic "stoppages" with some of the 
Samba users on this server.  Their Windows clients, which have drives
mapped to the IBM server (which has the iSCSI partitions "mounted"),
periodically "pause" for about 30-60 seconds.  The machines are NOT
locked up, as we can take screenshots, move the mouse, etc, but disk IO 
seems "stuck".  When it happens, about anywhere from 3-12 people are
affected (but not the other ~80 users).

There are no network errors on either the iSCSI interfaces, or the
switches and/or network interfaces.  The kernel is not swapping (though 
the symptoms SEEM a lot like a process getting swapped to disk).
The cpu usage is not spiking in correlation to the events, as far
as we can tell. It DOES appear that the % of time the CPU is servicing
IO requests is saturated, from iostat.  I will paste the iostat
info from one of the events below.

Has anyone else seem such behavion, and/or do you have suggestions for 
troubleshooting or otherwise correcting it?  Thanks.

-Scott

------------------------------------------------------------------------
This was sent from one of the techs working the problem:

I think I located the problem.  Collecting iostat data during the last 
lockup yielded the following information.

  Time: 03:20:38 PM

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.00  3.09  0.00   24.74    0.00    12.37     0.00
8.00     0.00    0.00   0.00   0.00

sdb          0.00   0.00 85.57  0.00  684.54    0.00   342.27     0.00
8.00     1.03   12.06  12.04 102.99

Time: 03:20:39 PM

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00  13.27  0.00 10.20    0.00  187.76     0.00    93.88
18.40     0.00    0.10   0.10   0.10

sdb          0.00   0.00 82.65  0.00  661.22    0.00   330.61     0.00
8.00     1.02   12.23  12.33 101.94

This clearly shows that the percent of CPU time used for I/O requests 
has reached the device saturation point which is 100%.  The utilization 
was at or above 100% the entire time that the freeze occured.  I am 
researching the issue now to see if this is something we can resolve 
with kernel tweaks or otherwise.  Any input regarding the issue is 
appreciated, thanks.