 
            Simon Banton wrote:
At 09:14 -0400 26/9/07, Ross S. W. Walker wrote:
Could you try the benchmarks with the 'deadline' scheduler?
OK, these are all with RHEL5, driver 2.26.06.002-2.6.18, RAID 1:
elevator=deadline: Sequential reads: | 2007/09/26-16:19:30 | START | 3065 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u) | 2007/09/26-16:20:00 | STAT | 3065 | v1.2.8 | /dev/sdb | Total read throughput: 45353642.7B/s (43.25MB/s), IOPS 11072.7/s.
That's a lot better, where it should be for those drives.
Sequential writes: | 2007/09/26-16:20:00 | START | 3082 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u) | 2007/09/26-16:20:30 | STAT | 3082 | v1.2.8 | /dev/sdb | Total write throughput: 53781186.2B/s (51.29MB/s), IOPS 13130.2/s.
Yup, with the write-back you'll see better write throughput then read at this block size.
Random reads: | 2007/09/26-16:20:30 | START | 3091 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0) | 2007/09/26-16:21:00 | STAT | 3091 | v1.2.8 | /dev/sdb | Total read throughput: 545587.2B/s (0.52MB/s), IOPS 133.2/s.
Same, random io would really be affected here.
Random writes: | 2007/09/26-16:21:00 | START | 3098 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100) | 2007/09/26-16:21:44 | STAT | 3098 | v1.2.8 | /dev/sdb | Total write throughput: 795852.8B/s (0.76MB/s), IOPS 194.3/s.
Same here.
Here are the others for comparison.
elevator=noop: Sequential reads: | 2007/09/26-16:24:02 | START | 3167 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u) | 2007/09/26-16:24:32 | STAT | 3167 | v1.2.8 | /dev/sdb | Total read throughput: 45467374.9B/s (43.36MB/s), IOPS 11100.4/s.
About the same as deadline, but you'll probably be better off with deadline as deadline will attempt to merge requests from separate sources to the same volume while noop will just send it as it gets it.
Sequential writes: | 2007/09/26-16:24:32 | START | 3176 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u) | 2007/09/26-16:25:02 | STAT | 3176 | v1.2.8 | /dev/sdb | Total write throughput: 53825672.5B/s (51.33MB/s), IOPS 13141.0/s.
Same for the others.
Random reads: | 2007/09/26-16:25:03 | START | 3193 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0) | 2007/09/26-16:25:32 | STAT | 3193 | v1.2.8 | /dev/sdb | Total read throughput: 540954.5B/s (0.52MB/s), IOPS 132.1/s. Random writes: | 2007/09/26-16:25:32 | START | 3202 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100) | 2007/09/26-16:26:16 | STAT | 3202 | v1.2.8 | /dev/sdb | Total write throughput: 795989.3B/s (0.76MB/s), IOPS 194.3/s.
elevator=anticipatory: Sequential reads: | 2007/09/26-16:37:04 | START | 3277 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u) | 2007/09/26-16:37:34 | STAT | 3277 | v1.2.8 | /dev/sdb | Total read throughput: 45414126.9B/s (43.31MB/s), IOPS 11087.4/s.
While anticipatory appears to be an adequate choice here it will cause performance issues from multiple writers as it keeps trying to anticipate those reads. For a server deadline is still the best.
Sequential writes: | 2007/09/26-16:37:35 | START | 3284 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u) | 2007/09/26-16:38:04 | STAT | 3284 | v1.2.8 | /dev/sdb | Total write throughput: 53895168.0B/s (51.40MB/s), IOPS 13158.0/s. Random reads: | 2007/09/26-16:38:04 | START | 3293 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0) | 2007/09/26-16:38:34 | STAT | 3293 | v1.2.8 | /dev/sdb | Total read throughput: 467080.5B/s (0.45MB/s), IOPS 114.0/s. Random writes: | 2007/09/26-16:38:34 | START | 3300 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100) | 2007/09/26-16:39:18 | STAT | 3300 | v1.2.8 | /dev/sdb | Total write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.
elevator=cfq (just to re-check): Sequential reads: | 2007/09/26-16:42:18 | START | 3353 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u) | 2007/09/26-16:42:48 | STAT | 3353 | v1.2.8 | /dev/sdb | Total read throughput: 2463470.9B/s (2.35MB/s), IOPS 601.4/s.
CFQ is intended for single disk workstations and it's io limits are based on that, so it actually acts as an io govenor on RAID setups.
Only use 'cfq' on single disk workstations.
Use 'deadline' on RAID setups and servers.
Sequential writes: | 2007/09/26-16:42:48 | START | 3360 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u) | 2007/09/26-16:43:18 | STAT | 3360 | v1.2.8 | /dev/sdb | Total write throughput: 54572782.9B/s (52.04MB/s), IOPS 13323.4/s. Random reads: | 2007/09/26-16:43:19 | START | 3369 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0) | 2007/09/26-16:43:48 | STAT | 3369 | v1.2.8 | /dev/sdb | Total read throughput: 267652.4B/s (0.26MB/s), IOPS 65.3/s. Random writes: | 2007/09/26-16:43:48 | START | 3376 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100) | 2007/09/26-16:44:31 | STAT | 3376 | v1.2.8 | /dev/sdb | Total write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.
Certainly cfq is severely cramping the reads, it appears.
Yes, as I mentioned above it allocatess IO per-executing thread based on the typical single disk io pattern and therefore limits the bandwidth going to disk per thread as a fraction of a single disk's performance.
-Ross
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.