[CentOS] 3Ware 9550SX and latency/system responsiveness

Wed Sep 26 16:01:18 UTC 2007
Ross S. W. Walker <rwalker at medallion.com>

Simon Banton wrote:
> 
> At 09:14 -0400 26/9/07, Ross S. W. Walker wrote:
> >Could you try the benchmarks with the 'deadline' scheduler?
> 
> OK, these are all with RHEL5, driver 2.26.06.002-2.6.18, RAID 1:
> 
> elevator=deadline:
> Sequential reads:
> | 2007/09/26-16:19:30 | START | 3065 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:20:00 | STAT  | 3065 | v1.2.8 | /dev/sdb | Total read 
> throughput: 45353642.7B/s (43.25MB/s), IOPS 11072.7/s.

That's a lot better, where it should be for those drives.

> Sequential writes:
> | 2007/09/26-16:20:00 | START | 3082 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:20:30 | STAT  | 3082 | v1.2.8 | /dev/sdb | Total 
> write throughput: 53781186.2B/s (51.29MB/s), IOPS 13130.2/s.

Yup, with the write-back you'll see better write throughput then
read at this block size.

> Random reads:
> | 2007/09/26-16:20:30 | START | 3091 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
> (-D 100:0)
> | 2007/09/26-16:21:00 | STAT  | 3091 | v1.2.8 | /dev/sdb | Total read 
> throughput: 545587.2B/s (0.52MB/s), IOPS 133.2/s.

Same, random io would really be affected here.

> Random writes:
> | 2007/09/26-16:21:00 | START | 3098 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
> (-D 0:100)
> | 2007/09/26-16:21:44 | STAT  | 3098 | v1.2.8 | /dev/sdb | Total 
> write throughput: 795852.8B/s (0.76MB/s), IOPS 194.3/s.

Same here.

> Here are the others for comparison.
> 
> elevator=noop:
> Sequential reads:
> | 2007/09/26-16:24:02 | START | 3167 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:24:32 | STAT  | 3167 | v1.2.8 | /dev/sdb | Total read 
> throughput: 45467374.9B/s (43.36MB/s), IOPS 11100.4/s.

About the same as deadline, but you'll probably be better off with
deadline as deadline will attempt to merge requests from separate
sources to the same volume while noop will just send it as it gets
it.

> Sequential writes:
> | 2007/09/26-16:24:32 | START | 3176 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:25:02 | STAT  | 3176 | v1.2.8 | /dev/sdb | Total 
> write throughput: 53825672.5B/s (51.33MB/s), IOPS 13141.0/s.

Same for the others.

> Random reads:
> | 2007/09/26-16:25:03 | START | 3193 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
> (-D 100:0)
> | 2007/09/26-16:25:32 | STAT  | 3193 | v1.2.8 | /dev/sdb | Total read 
> throughput: 540954.5B/s (0.52MB/s), IOPS 132.1/s.
> Random writes:
> | 2007/09/26-16:25:32 | START | 3202 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
> (-D 0:100)
> | 2007/09/26-16:26:16 | STAT  | 3202 | v1.2.8 | /dev/sdb | Total 
> write throughput: 795989.3B/s (0.76MB/s), IOPS 194.3/s.
> 
> elevator=anticipatory:
> Sequential reads:
> | 2007/09/26-16:37:04 | START | 3277 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:37:34 | STAT  | 3277 | v1.2.8 | /dev/sdb | Total read 
> throughput: 45414126.9B/s (43.31MB/s), IOPS 11087.4/s.

While anticipatory appears to be an adequate choice here it will
cause performance issues from multiple writers as it keeps trying to
anticipate those reads. For a server deadline is still the best.

> Sequential writes:
> | 2007/09/26-16:37:35 | START | 3284 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:38:04 | STAT  | 3284 | v1.2.8 | /dev/sdb | Total 
> write throughput: 53895168.0B/s (51.40MB/s), IOPS 13158.0/s.
> Random reads:
> | 2007/09/26-16:38:04 | START | 3293 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
> (-D 100:0)
> | 2007/09/26-16:38:34 | STAT  | 3293 | v1.2.8 | /dev/sdb | Total read 
> throughput: 467080.5B/s (0.45MB/s), IOPS 114.0/s.
> Random writes:
> | 2007/09/26-16:38:34 | START | 3300 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
> (-D 0:100)
> | 2007/09/26-16:39:18 | STAT  | 3300 | v1.2.8 | /dev/sdb | Total 
> write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.
> 
> elevator=cfq (just to re-check):
> Sequential reads:
> | 2007/09/26-16:42:18 | START | 3353 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:42:48 | STAT  | 3353 | v1.2.8 | /dev/sdb | Total read 
> throughput: 2463470.9B/s (2.35MB/s), IOPS 601.4/s.

CFQ is intended for single disk workstations and it's io limits are
based on that, so it actually acts as an io govenor on RAID setups.

Only use 'cfq' on single disk workstations.

Use 'deadline' on RAID setups and servers.

> Sequential writes:
> | 2007/09/26-16:42:48 | START | 3360 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
> (-p u)
> | 2007/09/26-16:43:18 | STAT  | 3360 | v1.2.8 | /dev/sdb | Total 
> write throughput: 54572782.9B/s (52.04MB/s), IOPS 13323.4/s.
> Random reads:
> | 2007/09/26-16:43:19 | START | 3369 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
> (-D 100:0)
> | 2007/09/26-16:43:48 | STAT  | 3369 | v1.2.8 | /dev/sdb | Total read 
> throughput: 267652.4B/s (0.26MB/s), IOPS 65.3/s.
> Random writes:
> | 2007/09/26-16:43:48 | START | 3376 | v1.2.8 | /dev/sdb | Start 
> args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
> (-D 0:100)
> | 2007/09/26-16:44:31 | STAT  | 3376 | v1.2.8 | /dev/sdb | Total 
> write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.
> 
> Certainly cfq is severely cramping the reads, it appears.

Yes, as I mentioned above it allocatess IO per-executing thread based
on the typical single disk io pattern and therefore limits the
bandwidth going to disk per thread as a fraction of a single disk's
performance.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.