[CentOS] C6 server responding extremely slow on ssh interactive

Fri Jan 30 09:21:52 UTC 2015
Patrick Bervoets <patrick.bervoets at psc-elsene.be>

Op 29-01-15 om 21:21 schreef Gordon Messmer:
>
> I haven't seen delays anywhere near that long before, even with heavy swapping.  But I guess I'd look at that sort of thing first.
>
> Run "iostat -x 2" and see if your disks are being fully utilized during the pauses.  Run "top" and see if there's anything useful there.  Check swap use with "free".  Try decreasing swappiness with "echo 10 >/proc/sys/vm/swappiness"
> _______________________________________________

iostat random sample
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            3,77    0,00    1,45    0,00    0,00   94,78

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,50    0,00   11,00     0,00 136,00    12,36     0,00    0,00   0,00   0,00
sdb               0,00     0,00    0,00   11,50     0,00 148,00    12,87     0,00    0,09   0,09   0,10
sdc               0,00     0,00    0,00    0,00     0,00 0,00     0,00     0,00    0,00   0,00   0,00
dm-0              0,00     0,00    0,00    4,00     0,00 32,00     8,00     0,00    0,00   0,00   0,00
dm-1              0,00     0,00    0,00    0,00     0,00 0,00     0,00     0,00    0,00   0,00   0,00
dm-2              0,00     0,00    0,00    0,00     0,00 0,00     0,00     0,00    0,00   0,00   0,00
dm-3              0,00     0,00    0,00   11,50     0,00 148,00    12,87     0,00    0,13   0,13   0,15
dm-4              0,00     0,00    0,00    0,00     0,00 0,00     0,00     0,00    0,00   0,00   0,00
dm-5              0,00     0,00    0,00    7,50     0,00 104,00    13,87     0,00    0,07   0,07   0,05

atop
ATOP -               2015/01/30  10:18:14 ---------          10s elapsed
PRC | sys    3.87s | user  14.93s | #proc    197 | #zombie    0 | #exit      0 |
CPU | sys      30% | user    119% | irq       1% | idle    533% | wait      0% |
cpu | sys       2% | user     21% | irq       0% | idle     56% | cpu000 w  0% |
cpu | sys       3% | user     19% | irq       0% | idle     59% | cpu001 w  0% |
cpu | sys       8% | user     15% | irq       0% | idle     62% | cpu003 w  0% |
cpu | sys       3% | user     13% | irq       0% | idle     73% | cpu002 w  0% |
cpu | sys       3% | user     14% | irq       0% | idle     70% | cpu006 w  0% |
cpu | sys       4% | user     15% | irq       0% | idle     66% | cpu005 w  0% |
cpu | sys       2% | user     11% | irq       0% | idle     77% | cpu007 w  0% |
cpu | sys       5% | user     11% | irq       0% | idle     73% | cpu004 w  0% |
CPL | avg1    1.92 | avg5    1.97 | avg15   1.61 | csw   229508 | intr  191786 |
MEM | tot    47.1G | free   15.9G | cache 519.3M | buff  109.3M | slab  353.3M |
SWP | tot     7.8G | free    7.3G |              | vmcom  31.8G | vmlim  31.3G |
LVM | g_15k-lv_15k | busy      0% | read       1 | write     98 | avio 0.15 ms |
LVM | to-lv_oracle | busy      0% | read       0 | write     66 | avio 0.06 ms |
LVM | v_oracletest | busy      0% | read       0 | write     79 | avio 0.05 ms |
LVM | uito-lv_root | busy      0% | read       0 | write      1 | avio 3.00 ms |
DSK |          sdb | busy      0% | read       1 | write     98 | avio 0.16 ms |
DSK |          sda | busy      0% | read       0 | write    146 | avio 0.08 ms |
NET | transport    | tcpi      12 | tcpo      12 | udpi       0 | udpo       0 |
NET | network      | ipi       13 | ipo       12 | ipfrw      0 | deliv     12 |
NET | vnet0     8% | pcki    2273 | pcko    2581 | si  850 Kbps | so  458 Kbps |
NET | vnet1     4% | pcki    2186 | pcko    2075 | si  391 Kbps | so  422 Kbps |
NET | eth0      0% | pcki    1330 | pcko    1432 | si  159 Kbps | so  537 Kbps |
NET | br0     ---- | pcki      43 | pcko      22 | si    1 Kbps | so    4 Kbps |

   PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK ST EXC S  CPU CMD
  1960   2.37s   9.23s     0K     0K     8K  2520K --   - S 101% qemu-kvm
  1990   0.69s   5.65s     0K     0K     0K  1196K --   - S  55% qemu-kvm
  1975   0.50s   0.00s     0K     0K     0K     0K --   - S   4% kvm-pit-wq
  2009   0.20s   0.00s     0K     0K     0K     0K --   - S   2% kvm-pit-wq
23321   0.05s   0.02s     0K     0K     0K     0K --   - R   1% atop
18384   0.05s   0.01s     0K     0K     0K     0K --   - S   1% atop
  1719   0.00s   0.01s     0K     0K     0K     0K --   - S   0% hpasmlited
  1746   0.00s   0.01s     0K     0K     0K     0K --   - S   0% hp-asrd
    35   0.01s   0.00s     0K     0K     0K     0K --   - D   0% events/0
10707   0.00s   0.00s     0K     0K     0K     0K --   - S   0% arping
10740   0.00s   0.00s     0K     0K     0K     0K --   - S   0% arping
    58   0.00s   0.00s     0K     0K     0K     0K --   - S   0% kblockd/0
18425   0.00s   0.00s     0K     0K     0K     0K --   - S   0% flush-253:0

free
              total       used       free     shared buffers     cached
Mem:         48218      31895      16323          0 108        519
-/+ buffers/cache:      31267      16951
Swap:         7951        476       7475

But I had the same pauses when free gave zero swap.

If swap is the problem: would it matter if a command is run with ssh (ssh @ "command") or in a shell?

When running atop in a shell I observed pauses between screen updates longer than 10 seconds but atop displayed the time as "10 seconds later". So drifting away in time.
While a date command sent a the same time gave the correct date.

So it seems like the screens are buffered and are being displayed with a delay.