[CentOS] waiting IOs...

Wed Sep 9 16:23:46 UTC 2009
John Doe <jdmls at yahoo.com>

Hi,

We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a SmartArray6400).
10 directories are exported through nfs to 10 clients (rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3).
The server is apparently not doing much but... we have very high waiting IOs.

dstat show very little activity, but high 'wai'...

# dstat 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  88  12   0   0| 413k   98k|   0     0 |   0     0 | 188   132 
  0   1  46  53   0   0| 716k   48k|  19k  420k|   0     0 |1345   476 
  0   1  49  50   0   1| 492k   32k|  12k  181k|   0     0 |1269   482 
  0   1  63  37   0   0| 316k  159k|  58k  278k|   0     0 |1789  1562 
  0   0  74  26   0   0|  84k  512k|1937B 6680B|   0     0 |1200   106 
  0   1  44  55   0   1| 612k   80k|  14k  221k|   0     0 |1378   538 
  1   1  52  47   0   0| 628k    0 |  17k  318k|   0     0 |1327   520 
  0   1  50  49   0   0| 484k   60k|  14k  178k|   0     0 |1303   494 
  0   0  87  13   0   0| 124k    0 |7745B  116k|   0     0 |1083   139 
  0   1  59  41   0   0| 316k   60k|4828B   67k|   0     0 |1179   346 

top shows that one nfsd is usualy in state 'D' (waiting).

# top -i    (sorted by cpu usage)
top - 18:11:28 up 207 days,  7:13,  2 users,  load average: 0.99, 1.07, 1.00
Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 54.3%id, 45.3%wa,  0.2%hi,  0.0%si,  0.0%st
Mem:   3089252k total,  3068112k used,    21140k free,   928468k buffers
Swap:  2008116k total,      164k used,  2007952k free,   293716k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ 
COMMAND                                                                                                                                                                                                               
16571 root      15   0 12708 1076  788 R    1  0.0   0:00.02
top                                                                                                                                                                                                                   
 2580 root      15   0     0    0    0 D    0  0.0   2:36.70 nfsd 

# cat /proc/net/rpc/nfsd
rc 8872 34768207 38630969
fh 142 0 0 0 0
io 2432226534 884662242
th 32 394 4851.311 2437.416 370.949 238.432 542.241 4.942 2.239 1.000 0.427 0.541
ra 64 3876274 5025 3724 2551 2030 2036 1506 1607 1219 1154 1136249
net 73410453 73261524 0 0
rpc 73408119 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 33 9503937 1315066 11670859 7139862 0 5033349 28129122 3729031 0 0 0 487614 0 1116215 0 0 2054329 21225 66 0 2351744
proc4 2 0 0
proc4ops 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Do you think nfs is the problem here?
If so, is there something wrong with our config?
Is it too much to have 10 dir x 10 clients, even if there is almost no traffic?

Thx,
JD