[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

Wed Jun 1 19:47:56 UTC 2016
Kelly Lesperance <klesperance at blackberry.com>

I did some additional testing - I stopped Kafka on the host, and kicked off a disk check, and it ran at the expected speed overnight. I started kafka this morning, and the raid check's speed immediately dropped down to ~2000K/Sec.

I then enabled the write-back cache on the drives (hdparm -W1 /dev/sd*). The raid check is now running between 100000K/Sec and 200000K/Sec, and has been for several hours (it fluctuates, but seems to stay within that range). Write-back cache is NOT enabled for the drives on the hosts we haven't upgraded yet, but the speeds are similar (I kicked off a raid check on one of our CentOS 6 hosts as well, the window seems to be 150000 - 200000K/Sec on that host).

Kelly

On 2016-05-27, 9:21 AM, "Kelly Lesperance" <klesperance at blackberry.com> wrote:

>All of our Kafka clusters are fairly write-heavy.  The cluster in question is our second-heaviest – we haven’t yet upgraded the heaviest, due to the issues we’ve been experiencing in this one. 
>
>Here is an iostat example from a host within the same cluster, but without the RAID check running:
>
>[root at r2k1 ~] # iostat -xdmc 1 10
>Linux 3.10.0-327.13.1.el7.x86_64 (r2k1) 	05/27/16 	_x86_64_	(32 CPU)
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           8.87    0.02    1.28    0.21    0.00   89.62
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.02     0.55    0.15   27.06     0.03    11.40   859.89     1.02   37.40   36.13   37.41   6.86  18.65
>sdf               0.02     0.48    0.15   26.99     0.03    11.40   862.17     0.15    5.56   40.94    5.37   7.27  19.73
>sdk               0.03     0.58    0.22   27.10     0.03    11.40   857.01     1.60   58.49   36.20   58.67   7.17  19.58
>sdb               0.02     0.52    0.15   27.43     0.03    11.40   848.37     0.02    0.78   42.84    0.55   7.07  19.50
>sdj               0.02     0.55    0.15   27.11     0.03    11.40   858.28     0.62   22.70   41.97   22.59   7.43  20.27
>sdg               0.03     0.68    0.22   27.76     0.03    11.40   836.98     0.76   27.10   34.36   27.04   7.33  20.51
>sde               0.03     0.48    0.22   26.99     0.03    11.40   860.43     0.33   12.07   33.16   11.90   7.34  19.98
>sda               0.03     0.52    0.22   27.43     0.03    11.40   846.65     0.57   20.48   36.42   20.35   7.34  20.31
>sdh               0.02     0.68    0.15   27.76     0.03    11.40   838.63     0.47   16.66   40.96   16.53   7.20  20.09
>sdc               0.03     0.55    0.22   27.06     0.03    11.40   858.19     0.74   27.30   36.96   27.22   7.55  20.58
>sdi               0.03     0.53    0.22   27.13     0.03    11.40   856.04     1.60   58.50   27.43   58.75   5.21  14.24
>sdl               0.02     0.56    0.15   27.11     0.03    11.40   858.27     1.12   41.09   27.89   41.16   5.00  13.63
>md127             0.00     0.00    2.53  161.84     0.36    68.39   856.56     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          13.11    0.00    1.82    1.07    0.00   84.01
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00   81.00     0.00    38.48   972.95    51.00  219.06    0.00  219.06   6.37  51.60
>sdf               0.00     1.00    0.00   73.00     0.00    33.70   945.33    55.02  235.86    0.00  235.86   7.12  52.00
>sdk               0.00     1.00    0.00   56.00     0.00    25.70   939.73    60.45  223.79    0.00  223.79   9.29  52.00
>sdb               0.00     2.00    0.00   70.00     0.00    34.48  1008.70    58.88  292.81    0.00  292.81   7.37  51.60
>sdj               0.00     3.00    0.00   62.00     0.00    29.87   986.60    59.32  243.48    0.00  243.48   8.26  51.20
>sdg               0.00     1.00    0.00   49.00     0.00    23.43   979.45    60.37  234.98    0.00  234.98  10.53  51.60
>sde               0.00     1.00    0.00   61.00     0.00    27.95   938.38    58.17  239.57    0.00  239.57   8.52  52.00
>sda               0.00     2.00    0.00   56.00     0.00    27.48  1004.88    56.27  202.88    0.00  202.88   9.27  51.90
>sdh               0.00     1.00    0.00   70.00     0.00    33.57   982.19    59.00  277.84    0.00  277.84   7.43  52.00
>sdc               0.00     0.00    0.00   64.00     0.00    30.06   961.89    58.20  268.30    0.00  268.30   8.08  51.70
>sdi               0.00     3.00    0.00  116.00     0.00    55.62   981.94    44.54  199.72    0.00  199.72   4.56  52.90
>sdl               0.00     1.00    0.00  128.00     0.00    60.31   964.88    43.91  215.94    0.00  215.94   4.11  52.60
>md127             0.00     0.00    0.00 1143.00     0.00   538.90   965.59     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          15.70    0.00    1.97    0.44    0.00   81.89
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00  119.00     0.00    56.39   970.42    42.84  639.45    0.00  639.45   6.66  79.20
>sdf               0.00     1.00    0.00  129.00     0.00    61.21   971.84    48.89  672.04    0.00  672.04   6.34  81.80
>sdk               0.00     0.00    0.00  152.00     0.00    72.62   978.53    61.02  716.76    0.00  716.76   5.74  87.20
>sdb               0.00     1.00    0.00  133.00     0.00    62.86   967.88    54.10  695.35    0.00  695.35   6.45  85.80
>sdj               0.00     0.00    0.00  146.00     0.00    68.36   958.85    69.22  767.12    0.00  767.12   6.85 100.00
>sdg               0.00     0.00    0.00  146.00     0.00    69.87   980.11    77.99  789.53    0.00  789.53   6.85 100.00
>sde               0.00     1.00    0.00  141.00     0.00    66.96   972.60    56.21  707.61    0.00  707.61   6.21  87.60
>sda               0.00     1.00    0.00  147.00     0.00    69.86   973.22    62.21  728.76    0.00  728.76   6.32  92.90
>sdh               0.00     0.00    0.00  134.00     0.00    62.61   956.90    55.79  711.49    0.00  711.49   6.63  88.90
>sdc               0.00     0.00    0.00  136.00     0.00    64.81   975.94    61.46  753.57    0.00  753.57   6.93  94.20
>sdi               0.00     0.00    0.00   93.00     0.00    42.67   939.61    17.60  419.10    0.00  419.10   4.63  43.10
>sdl               0.00     0.00    0.00   80.00     0.00    38.02   973.20    11.00  340.79    0.00  340.79   4.25  34.00
>md127             0.00     0.00    0.00   87.00     0.00    40.99   964.97     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          12.11    0.00    1.35    0.00    0.00   86.54
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01   15.00    0.00   15.00  15.00   1.50
>sdf               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01   11.00    0.00   11.00  11.00   1.10
>sdk               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01   11.00    0.00   11.00  11.00   1.10
>sdb               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01    7.00    0.00    7.00   7.00   0.70
>sdj               0.00     0.00    0.00    2.00     0.00     0.06    64.50     0.01  733.50    0.00  733.50   7.50   1.50
>sdg               0.00     0.00    0.00   10.00     0.00     2.88   588.90     0.55 1212.80    0.00 1212.80  15.50  15.50
>sde               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01   12.00    0.00   12.00  12.00   1.20
>sda               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01   11.00    0.00   11.00  11.00   1.10
>sdh               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.02   20.00    0.00   20.00  20.00   2.00
>sdc               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.02   17.00    0.00   17.00  17.00   1.70
>sdi               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.01   12.00    0.00   12.00  12.00   1.20
>sdl               0.00     0.00    0.00    1.00     0.00     0.00     1.00     0.02   17.00    0.00   17.00  17.00   1.70
>md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          15.22    0.00    1.50    0.00    0.00   83.28
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          16.96    0.09    1.63    0.16    0.00   81.16
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00    8.00     0.00     0.66   168.25     0.09   11.50    0.00   11.50   8.75   7.00
>sdf               0.00     0.00    0.00    5.00     0.00     0.52   213.20     0.08   16.20    0.00   16.20  16.20   8.10
>sdk               0.00     0.00    0.00    3.00     0.00     0.50   342.00     0.06   20.33    0.00   20.33  20.33   6.10
>sdb               0.00     0.00    0.00    3.00     0.00     0.50   342.00     0.05   16.67    0.00   16.67  16.67   5.00
>sdj               0.00     0.00    0.00    4.00     0.00     0.98   500.50     0.06   14.50    0.00   14.50  11.00   4.40
>sdg               0.00     1.00    0.00    4.00     0.00     0.63   322.50     0.14   36.00    0.00   36.00  32.75  13.10
>sde               0.00     0.00    0.00    5.00     0.00     0.52   213.20     0.07   13.60    0.00   13.60  13.60   6.80
>sda               0.00     0.00    0.00    3.00     0.00     0.50   342.00     0.05   15.67    0.00   15.67  15.67   4.70
>sdh               0.00     1.00    0.00    4.00     0.00     0.63   322.50     0.06   14.50    0.00   14.50  11.50   4.60
>sdc               0.00     0.00    0.00    8.00     0.00     0.66   168.25     0.11   13.25    0.00   13.25  10.62   8.50
>sdi               0.00     0.00    0.00    4.00     0.00     0.98   500.50     0.06   15.50    0.00   15.50  12.00   4.80
>sdl               0.00     0.00    0.00    3.00     0.00     0.50   342.00     0.04   13.67    0.00   13.67  13.67   4.10
>md127             0.00     0.00    0.00   17.00     0.00     3.78   455.53     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          14.08    0.00    1.50    0.00    0.00   84.42
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          14.89    0.00    1.98    0.00    0.00   83.13
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00   90.00     0.00    41.31   940.01    27.25  302.80    0.00  302.80   7.07  63.60
>sdf               0.00     0.00    0.00   87.00     0.00    41.35   973.44    22.73  261.30    0.00  261.30   6.92  60.20
>sdk               0.00     2.00    0.00   97.00     0.00    42.08   888.42    39.86  410.94    0.00  410.94   8.10  78.60
>sdb               0.00     0.00    0.00   87.00     0.00    41.07   966.82    24.39  280.30    0.00  280.30   7.14  62.10
>sdj               0.00     1.00    0.00   91.00     0.00    41.94   943.92    36.37  399.62    0.00  399.62   8.44  76.80
>sdg               0.00     0.00    0.00   86.00     0.00    40.67   968.48    31.76  369.33    0.00  369.33   8.81  75.80
>sde               0.00     0.00    0.00   87.00     0.00    41.35   973.44    30.80  354.05    0.00  354.05   9.01  78.40
>sda               0.00     0.00    0.00   87.00     0.00    41.07   966.82    32.61  374.80    0.00  374.80   8.57  74.60
>sdh               0.00     0.00    0.00   86.00     0.00    40.67   968.48    29.52  343.23    0.00  343.23   8.56  73.60
>sdc               0.00     0.00    0.00   89.00     0.00    40.81   939.07    32.80  360.15    0.00  360.15   8.91  79.30
>sdi               0.00     1.00    0.00   91.00     0.00    41.94   943.92    19.60  215.34    0.00  215.34   5.62  51.10
>sdl               0.00     2.00    0.00   97.00     0.00    42.08   888.42    19.59  201.93    0.00  201.93   4.69  45.50
>md127             0.00     0.00    0.00  535.00     0.00   248.42   950.95     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          11.08    0.00    1.41    0.00    0.00   87.51
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     5.00    0.00   42.00     0.00     0.38    18.55     2.25   53.52    0.00   53.52   4.93  20.70
>sdf               0.00     0.00    0.00   35.00     0.00     0.21    12.43     1.62   46.17    0.00   46.17   5.29  18.50
>sdk               0.00    23.00    0.00   42.00     0.00     0.44    21.40     1.99   47.29    0.00   47.29   4.64  19.50
>sdb               0.00     9.00    0.00   58.00     0.00     0.34    12.02     2.77   47.78    0.00   47.78   4.12  23.90
>sdj               0.00     1.00    0.00   39.00     0.00     0.24    12.79     1.79   45.97    0.00   45.97   5.21  20.30
>sdg               0.00    11.00    0.00   66.00     0.00     0.40    12.45     3.60   54.47    0.00   54.47   3.42  22.60
>sde               0.00     0.00    0.00   35.00     0.00     0.21    12.43     2.13   61.00    0.00   61.00   8.89  31.10
>sda               0.00     9.00    0.00   58.00     0.00     0.34    12.02     2.48   42.81    0.00   42.81   3.71  21.50
>sdh               0.00    11.00    0.00   66.00     0.00     0.40    12.45     4.81   72.83    0.00   72.83   3.80  25.10
>sdc               0.00     5.00    0.00   43.00     0.00     0.88    41.93     1.99   63.81    0.00   63.81   5.00  21.50
>sdi               0.00     1.00    0.00   39.00     0.00     0.24    12.79     1.31   33.69    0.00   33.69   4.03  15.70
>sdl               0.00    23.00    0.00   42.00     0.00     0.44    21.40     1.23   29.33    0.00   29.33   3.71  15.60
>md127             0.00     0.00    0.00  313.00     0.00     2.01    13.14     0.00    0.00    0.00    0.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          16.16    0.03    1.66    0.00    0.00   82.15
>
>Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>
>On 2016-05-26, 11:50 PM, "centos-bounces at centos.org on behalf of Gordon Messmer" <centos-bounces at centos.org on behalf of gordon.messmer at gmail.com> wrote:
>
>>On 05/25/2016 09:54 AM, Kelly Lesperance wrote:
>>> What we're seeing is that when the weekly raid-check script executes, performance nose dives, and I/O wait skyrockets. The raid check starts out fairly fast (20000K/sec - the limit that's been set), but then quickly drops down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults:
>>
>>It looks like some pretty heavy writes are going on at the time. I'm not 
>>sure what you mean by "nose dives", but I'd expect *some* performance 
>>impact of running a read-intensive process like a RAID check at the same 
>>time you're running a write-intensive process.
>>
>>Do the same write-heavy processes run on the other clusters, where you 
>>aren't seeing performance issues?
>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>             9.24    0.00    1.32   20.02    0.00   69.42
>>>
>>> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
>>> sda              50.00       512.00     20408.00        512      20408
>>> sdb              50.00       512.00     20408.00        512      20408
>>> sdc              48.00       512.00     19984.00        512      19984
>>> sdd              48.00       512.00     19984.00        512      19984
>>> sdf              50.00       704.00     19968.00        704      19968
>>> sdg              47.00       512.00     19968.00        512      19968
>>> sdh              47.00       512.00     19968.00        512      19968
>>> sde              50.00       704.00     19968.00        704      19968
>>> sdj              48.00       512.00     19972.00        512      19972
>>> sdi              48.00       512.00     19972.00        512      19972
>>> sdk              48.00       512.00     19980.00        512      19980
>>> sdl              48.00       512.00     19980.00        512      19980
>>> md127           241.00         0.00    120280.00          0     120280
>>
>>_______________________________________________
>>CentOS mailing list
>>CentOS at centos.org
>>https://lists.centos.org/mailman/listinfo/centos
>