I made the mistake of looking at disk IO numbers in two different ways -- now I'm confused, because they give inconsistent answers.
First way was using 'vmstat 10'. This gave me (apologies for wrapped lines):
r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 2162944 4071928 162444 4218456 0 0 0 286 1103 528 3 2 95 0 0 1 0 2162944 4071976 162448 4218440 0 0 0 301 1102 548 2 4 95 0 0 2 0 2162944 4074488 162456 4218448 0 0 0 252 1097 501 1 4 96 0 0 2 0 2162944 4081572 162480 4218508 0 0 0 430 1145 1006 2 3 95 0 0 2 0 2162944 4079340 162488 4218508 0 0 0 354 1148 604 2 3 95 0 0 1 0 2162944 4082604 162492 4218512 0 0 0 258 1105 446 1 4 96 0 0 1 0 2162944 4084052 162500 4218520 0 0 0 300 1101 482 1 4 95 0 0 1 0 2162944 4080652 162500 4218536 0 0 0 393 1118 585 1 3 95 0 0 1 0 2162944 4081160 162500 4218536 0 0 0 304 1100 462 0 4 95 0 0 1 0 2162944 4075636 162508 4218536 0 0 0 214 1132 397 0 4 96 0 0 3 0 2162944 4081640 162516 4218540 0 0 0 332 1111 554 2 3 94 0 0 1 0 2162944 4075104 162516 4218552 0 0 0 382 1179 566 2 3 95 0 0
The "bo" column, block out, is described in the man page as being blocks per second. I believe the blocks are 512 bytes.
Okay; but then I used SNMP to fetch 1.3.6.1.4.1.2021.11.57.0 (ssIORawSent). That's an incrementing counter of blocks sent. I'm fetching it every 10 seconds, same as before.
3,204,124,952 1,603 820,736 3,204,139,960 1,500 768,000 3,204,155,848 1,588 813,056 3,204,164,600 875 448,000 3,204,184,536 1,993 1,020,416 3,204,194,184 964 493,568 3,204,204,040 896 458,752 3,204,218,696 1,465 750,080 3,204,235,224 1,652 845,824
The first column is the counter; the second column is the difference between them divided by the actual number of seconds elapsed (i.e. it tries to correct for imprecisions in the sleep; though in fact when I monitored that, it was hitting the exact second consistently), i.e, the second column is blocks per second. And the third column is bytes per second based on a 512-byte block.
You'll note that the blocks per second figures are not compatible with the blocks per second figures from vmstat.
These two sets of numbers overlap, and the numbers before and after are similar.
So what's up with that ?
(Here's my monitoring code that produced the second set of figures, in case I did something dumb-ass:
#! /bin/bash set -e
HOST=prcapp01 secs=10 lbc=0 lts=0
echo "blockcount bl/sec bytes/sec" while true; do bc=$( snmpget -v 2c -c xxx $HOST 1.3.6.1.4.1.2021.11.57.0 | cut -d' ' -f4 ) ts=$( date +%s ) if [[ $lbc > 0 ]]; then (( bsec = ( bc - lbc ) / ( ts - lts ) )) (( bytes = bsec * 512 )) #echo $lbc $bc $bsec $bytes $lts $ts printf "%'12d %'7d %'10d\n" $bc $bsec $bytes fi lbc="$bc" lts="$ts" sleep $secs done
I have obfuscated the read-only community name.)