DRBD very slow....

List overview All Threads
Download

newer

older

out of memory

rattle rattle

Coert Waagmeester

22 Jul 2009 22 Jul '09

9:16 a.m.

Hello all,

we have a new setup with xen on centos5.3

I run drbd from lvm volumes to mirror data between the two servers.

both servers are 1U nec rack mounts with 8GB RAM, 2x mirrored 1TB seagate satas.

The one is a dual core xeon, and the other a quad-core xeon.

I have a gigabit crossover link between the two with an MTU of 9000 on each end.

I currently have 6 drbds mirroring across that link.

The highest speed I can get through that link with drbd is 11 MB/sec (megabytes)

But if I copy a 1 gig file over that link I get 110 MB/sec.

Why is DRBD so slow?

I am not using drbd encryption because of the back to back link. Here is a part of my drbd config:

# cat /etc/drbd.conf global { usage-count yes; } common { protocol C; syncer { rate 80M; } net { allow-two-primaries; } } resource xenotrs { device /dev/drbd6; disk /dev/vg0/xenotrs; meta-disk internal;

on baldur.somedomain.local { address 10.99.99.1:7793; } on thor.somedomain.local { address 10.99.99.2:7793; } }

Kind regards, Coert

Show replies by date

Coert Waagmeester

22 Jul 22 Jul

9:59 a.m.

On Wed, 2009-07-22 at 11:16 +0200, Coert Waagmeester wrote:

...

Hello all,

we have a new setup with xen on centos5.3

I run drbd from lvm volumes to mirror data between the two servers.

both servers are 1U nec rack mounts with 8GB RAM, 2x mirrored 1TB seagate satas.

The one is a dual core xeon, and the other a quad-core xeon.

I have a gigabit crossover link between the two with an MTU of 9000 on each end.

I currently have 6 drbds mirroring across that link.

The highest speed I can get through that link with drbd is 11 MB/sec (megabytes)

But if I copy a 1 gig file over that link I get 110 MB/sec.

Why is DRBD so slow?

I am not using drbd encryption because of the back to back link. Here is a part of my drbd config:

# cat /etc/drbd.conf global { usage-count yes; } common { protocol C; syncer { rate 80M; } net { allow-two-primaries; } } resource xenotrs { device /dev/drbd6; disk /dev/vg0/xenotrs; meta-disk internal;

on baldur.somedomain.local { address 10.99.99.1:7793; } on thor.somedomain.local { address 10.99.99.2:7793; } }

Kind regards, Coert

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

I am reading up on this on the internet as well, but all the tcp settings and disk settings make me slightly nervous...

Ross Walker

1:57 p.m.

On Jul 22, 2009, at 5:59 AM, Coert Waagmeester <lgroups@waagmeester.co.za

...

wrote:

...

I am reading up on this on the internet as well, but all the tcp settings and disk settings make me slightly nervous...

Just get it going without those tuning options, run some bench marks on it, see where it is not performing well, look at the drbd

Ross Walker

2:01 p.m.

On Jul 22, 2009, at 9:57 AM, Ross Walker rswwalker@gmail.com wrote:

...

On Jul 22, 2009, at 5:59 AM, Coert Waagmeester <lgroups@waagmeester.co.za

...
wrote:

...
I am reading up on this on the internet as well, but all the tcp settings and disk settings make me slightly nervous...

Just get it going without those tuning options, run some bench marks on it, see where it is not performing well, look at the drbd

Oops sent too early.

Look at the drbd stats then make the appropriate tuning changes for what you need.

Don't blindly tune because some web site says it makes things faster, his workload and yours are two different workloads.

Always ask why.

-Ross

Coert Waagmeester

23 Jul 23 Jul

7:44 a.m.

Hello all,

For completeness here is my current setup:

host1: Xeon Quad-Core 8GB RAM Centos 5.3 64bit 2x 1TB seagate sata disks in software raid level 1 LVM on top of the raid for dom0 root fs and for all domU root FSses

host2: Xeon Dual-Core 8GB RAM Centos 5.3 64bit 2x 1TB seagate sata disks in software raid level 1 LVM on top of the raid for dom0 root fs and for all domU root FSses

common: hosts are connected to local LAN and directly to each other with a CAT6 gigabit crossover.

I have 6 DRBDs running for 5 domUs over the back to back link. DRBD version drbd82-8.2.6-1.el5.centos _______________________________________________________________________ _______________________________________________________________________

Ok, here is what I have done:

_______________________________________________________________________ I have added the following to the drbd config: disk { no-disk-flushes; no-md-flushes; }

That made the resync go up to 50MB/sec after I issued a drbdsetup /dev/drbdX syncer -r 110M

It used to stick around at 11MB/sec

As far as i can tell it has improved the domUs disk access as well.

I do see that there are a lot of warnings to be heeded with disk and metadata flushing...... _______________________________________________________________________

iperf results:

on host 1: # iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 5] local 10.99.99.1 port 5001 connected with 10.99.99.2 port 58183 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 1.16 GBytes 990 Mbits/sec

on host 2: # iperf -c 10.99.99.1 ------------------------------------------------------------ Client connecting to 10.99.99.1, TCP port 5001 TCP window size: 73.8 KByte (default) ------------------------------------------------------------ [ 3] local 10.99.99.2 port 58183 connected with 10.99.99.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.16 GBytes 992 Mbits/sec

I am assuming those results are to be expected from a back to back gigabit. _______________________________________________________________________

the dd thing. I think I did this completely wrong, how is this supposed to be done?

this is what i did

host 1: nc -l 8123 | dd of=/mnt/data/1gig.file oflag=direct (/mnt/data is an ext3 FS in LVM mounted on dom0)

host 2: date; dd if=/dev/zero bs=1M count=1000 | nc 10.99.99.2 8123 ; date

I did not wait for it to finish... according to ifstat the average speed I got during this transfer was 1.6MB/sec

_______________________________________________________________________

Any tips would be greatly appreciated.

Kind regards, Coert

Ross Walker

22 Jul 22 Jul

1:50 p.m.

On Jul 22, 2009, at 5:16 AM, Coert Waagmeester <lgroups@waagmeester.co.za

...

wrote:

...

Hello all,

we have a new setup with xen on centos5.3

I run drbd from lvm volumes to mirror data between the two servers.

both servers are 1U nec rack mounts with 8GB RAM, 2x mirrored 1TB seagate satas.

The one is a dual core xeon, and the other a quad-core xeon.

I have a gigabit crossover link between the two with an MTU of 9000 on each end.

I currently have 6 drbds mirroring across that link.

The highest speed I can get through that link with drbd is 11 MB/sec (megabytes)

But if I copy a 1 gig file over that link I get 110 MB/sec.

Why is DRBD so slow?

I am not using drbd encryption because of the back to back link. Here is a part of my drbd config:

# cat /etc/drbd.conf global { usage-count yes; } common { protocol C; syncer { rate 80M; } net { allow-two-primaries; } } resource xenotrs { device /dev/drbd6; disk /dev/vg0/xenotrs; meta-disk internal;

on baldur.somedomain.local { address 10.99.99.1:7793; } on thor.somedomain.local { address 10.99.99.2:7793; } }

Use iperf to measure the bandwidth/latency on those nics between the two hosts.

If iperf comes back clean use dd with the oflag=direct to test the performance of the drives on both sides (create a test LV). Roughly you can multiply with the max number of outstanding I/Os your application does to get a real number, use 4 if you don't know what that is.

DRBD protocol C is completely synchronous and won't return a write until it has been committed to disk on both sides.

Having disk controllers with nvram cache can make all the difference in the world for this setup.

-Ross

Roman Savelyev

24 Jul 24 Jul

6:21 a.m.

1. You are hit by Nagel alghoritm (slow TCP response). You can build DRBD 8.3. In 8.3 "TCP_NODELAY" and "QUICK_RESPONSE" implemented in place. 2. You are hit by DRBD protocol. In most cases, "B" is enought. 3. You are hit by triple barriers. In most cases you are need only one of "barrier, flush, drain" - see documentation, it depens on type of storage hardware.

...

...
The highest speed I can get through that link with drbd is 11 MB/sec (megabytes)

But if I copy a 1 gig file over that link I get 110 MB/sec.

Why is DRBD so slow?

Coert Waagmeester

7:28 a.m.

On Fri, 2009-07-24 at 10:21 +0400, Roman Savelyev wrote:

...

You are hit by Nagel alghoritm (slow TCP response). You can build DRBD

8.3. In 8.3 "TCP_NODELAY" and "QUICK_RESPONSE" implemented in place. 2. You are hit by DRBD protocol. In most cases, "B" is enought. 3. You are hit by triple barriers. In most cases you are need only one of "barrier, flush, drain" - see documentation, it depens on type of storage hardware.

I have googled the triple barriers thing but cant find that much information.

Would it help if I used IPv6 instead of IPv4?

Ross, here are the results of those tests you suggested: ________________________________________________________________________________________ For completeness here is my current setup:

host1: 10.99.99.2 Xeon Quad-Core 8GB RAM Centos 5.3 64bit 2x 1TB seagate sata disks in software raid level 1 LVM on top of the raid for dom0 root fs and for all domU root FSses

host2: 10.99.99.1 Xeon Dual-Core 8GB RAM Centos 5.3 64bit 2x 1TB seagate sata disks in software raid level 1 LVM on top of the raid for dom0 root fs and for all domU root FSses

common: hosts are connected to local LAN and directly to each other with a CAT6 gigabit crossover.

Ok, here is what I have done:

_______________________________________________________________________ I have added the following to the drbd config: disk { no-disk-flushes; no-md-flushes; }

That made the resync go up to 50MB/sec after I issued a drbdsetup /dev/drbdX syncer -r 110M

It used to stick around at 11MB/sec

As far as i can tell it has improved the domUs disk access as well.

I do see that there are a lot of warnings to be heeded with disk and metadata flushing...... _______________________________________________________________________

iperf results:

I am assuming those results are to be expected from a back to back gigabit. _______________________________________________________________________

the dd thing. I think I did this completely wrong, how is this supposed to be done?

this is what i did

host 1: nc -l 8123 | dd of=/mnt/data/1gig.file oflag=direct (/mnt/data is an ext3 FS in LVM mounted on dom0) (Not drbd) i first wanted to try it locally.

host 2: date; dd if=/dev/zero bs=1M count=1000 | nc 10.99.99.2 8123 ; date

I did not wait for it to finish... according to ifstat the average speed I got during this transfer was 1.6MB/sec

_______________________________________________________________________

Any tips would be greatly appreciated.

Roman Savelyev

9:58 a.m.

...

I have googled the triple barriers thing but cant find that much information.

Please refer to drbdsetup(8) for detailed description of the parameters. no-disk-barrier, no-disk-flushes, no-disk-drain, no-md-flushes

...

Would it help if I used IPv6 instead of IPv4?

No.

And small transaction must be very slow on DRBD prior to 8.3.

Ross Walker

1:27 p.m.

On Jul 24, 2009, at 3:28 AM, Coert Waagmeester <lgroups@waagmeester.co.za

...

wrote:

...

On Fri, 2009-07-24 at 10:21 +0400, Roman Savelyev wrote:

...

You are hit by Nagel alghoritm (slow TCP response). You can

build DRBD 8.3. In 8.3 "TCP_NODELAY" and "QUICK_RESPONSE" implemented in place. 2. You are hit by DRBD protocol. In most cases, "B" is enought. 3. You are hit by triple barriers. In most cases you are need only one of "barrier, flush, drain" - see documentation, it depens on type of storage hardware.

I have googled the triple barriers thing but cant find that much information.

Would it help if I used IPv6 instead of IPv4?

Triple barriers wouldn't affect you as this is on top of LVM and LVM doesn't support barriers, so it acts like a filter for them. Not good, but that's the state of things.

I would have run the dd tests locally and not with netcat, the idea is to take the network out of the picture.

Given the tests though it looks like the disks have their write caches disabled which cripples them, but with LVM filtering barriers, it's the safest configuration.

The way to get fast and safe is to use partitions instead of logical volumes. If you need more then 4 then use GPT partition table which allows up to 256 I believe. Then you can enable the disk caches as drbd will issue barrier writes to assure consistency (hmmm maybe the barrier problem is with devmapper which means software RAID will be a problem too? Need to check that).

Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

-Ross

Coert Waagmeester

26 Jul 26 Jul

6:08 p.m.

On Fri, 2009-07-24 at 09:27 -0400, Ross Walker wrote:

...

On Jul 24, 2009, at 3:28 AM, Coert Waagmeester <lgroups@waagmeester.co.za

...
wrote:

...
On Fri, 2009-07-24 at 10:21 +0400, Roman Savelyev wrote:

...

You are hit by Nagel alghoritm (slow TCP response). You can

build DRBD 8.3. In 8.3 "TCP_NODELAY" and "QUICK_RESPONSE" implemented in place. 2. You are hit by DRBD protocol. In most cases, "B" is enought. 3. You are hit by triple barriers. In most cases you are need only one of "barrier, flush, drain" - see documentation, it depens on type of storage hardware.

I have googled the triple barriers thing but cant find that much information.

Would it help if I used IPv6 instead of IPv4?

Triple barriers wouldn't affect you as this is on top of LVM and LVM doesn't support barriers, so it acts like a filter for them. Not good, but that's the state of things.

I would have run the dd tests locally and not with netcat, the idea is to take the network out of the picture.

I have run the dd again locally.

It writes to an LVM volume on top of Software RAID 1 mounted in dom0: # dd if=/dev/zero of=/mnt/data/1gig.file oflag=direct bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 24.3603 seconds, 43.0 MB/s

...

Given the tests though it looks like the disks have their write caches disabled which cripples them, but with LVM filtering barriers, it's the safest configuration.

The way to get fast and safe is to use partitions instead of logical volumes. If you need more then 4 then use GPT partition table which allows up to 256 I believe. Then you can enable the disk caches as drbd will issue barrier writes to assure consistency (hmmm maybe the barrier problem is with devmapper which means software RAID will be a problem too? Need to check that).

I am reading up on GPT, and that seems like a viable option. Will keep you posted.

Most googles point to software raid 1 supporting barriers. not too sure though.

...

Or

Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

This is not going to be easy.... The servers we use are 1U rackmount, and the single available PCI-express port is used up on both servers by a quad gigabit network card.

...

-Ross

Thanks for all the valuable tips so far, I will keep you posted.

Roman Savelyev

27 Jul 27 Jul

6:18 a.m.

...

Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

It's a better way. But socket oprions in DRBD up to 8.2 (Nagel alghoritm) can decrease performance in large amount of small syncronius writes.

Coert Waagmeester

6:30 a.m.

On Mon, 2009-07-27 at 10:18 +0400, Roman Savelyev wrote:

...

...
Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

It's a better way. But socket oprions in DRBD up to 8.2 (Nagel alghoritm) can decrease performance in large amount of small syncronius writes.

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

How do I disable that nagle algorithm?

Coert Waagmeester

6:42 a.m.

On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:

...

On Mon, 2009-07-27 at 10:18 +0400, Roman Savelyev wrote:

...
...
Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

It's a better way. But socket oprions in DRBD up to 8.2 (Nagel alghoritm) can decrease performance in large amount of small syncronius writes.

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

How do I disable that nagle algorithm?

On google I found the following page: http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html

I have found in the drbdsetup (8) man page the sndbuf-size option, and I will try setting this.

On the nabble page they talk about the TCP_NODELAY and TCP_QUICKACK socket option. Does this have to do with Nagle algorithm?

Where do I set these socket options? Do I have to compile drbd with them?

...

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

John R Pierce

6:50 a.m.

...

On google I found the following page: http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html

I have found in the drbdsetup (8) man page the sndbuf-size option, and I will try setting this.

On the nabble page they talk about the TCP_NODELAY and TCP_QUICKACK socket option. Does this have to do with Nagle algorithm?

Where do I set these socket options? Do I have to compile drbd with them?

those are socket flags set on a socket open() call. the page you refer to specifically says they changed this in 8.2.7

Alexander Dalloz

10:02 a.m.

...

On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:

...

...
Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

Hi,

have you considered to test the drbd-8.3 packages?

http://bugs.centos.org/view.php?id=3598

http://dev.centos.org/centos/5/testing/%7Bi386,x86_64%7D/RPMS/

Best regards

Alexander

Coert Waagmeester

10:37 a.m.

On Mon, 2009-07-27 at 12:02 +0200, Alexander Dalloz wrote:

...

...
On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:

...
...
Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

Hi,

have you considered to test the drbd-8.3 packages?

http://bugs.centos.org/view.php?id=3598

http://dev.centos.org/centos/5/testing/%7Bi386,x86_64%7D/RPMS/

Best regards

Alexander

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Thank you very much for this tip! It was one very obvious place where I did not look yet.

Would it be necessary to still recompile it for the TCP_NODELAY and such?

I am just making sure, because http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html makes it seem unnecessary.

Why do the repositories provide both DRBD 8.0.x and 8.2.6?

Thank you all again, Coert

Coert Waagmeester

8:09 p.m.

On Mon, 2009-07-27 at 12:37 +0200, Coert Waagmeester wrote:

...

On Mon, 2009-07-27 at 12:02 +0200, Alexander Dalloz wrote:

...
...
On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:

...
...
Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

Hi,

have you considered to test the drbd-8.3 packages?

http://bugs.centos.org/view.php?id=3598

http://dev.centos.org/centos/5/testing/%7Bi386,x86_64%7D/RPMS/

Best regards

Alexander

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Thank you very much for this tip! It was one very obvious place where I did not look yet.

Would it be necessary to still recompile it for the TCP_NODELAY and such?

I am just making sure, because http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html makes it seem unnecessary.

Why do the repositories provide both DRBD 8.0.x and 8.2.6?

Thank you all again, Coert

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hello all,

Here is a status update.... _______________________________________________________________________________ on both hosts I now run from the testing repository: # rpm -qa | grep drbd drbd83-8.3.1-5.el5.centos kmod-drbd83-xen-8.3.1-4.el5.centos _______________________________________________________________________________ Here is my config (slightly condensed): ----------------------------------------------------- global { usage-count yes; } common { protocol C; syncer { rate 50M; } net { # allow-two-primaries; } sndbuf-size 0; } # disk {no-disk-flushes; # no-md-flushes; } startup { wfc-timeout 0 ; } } resource xenfilesrv { device /dev/drbd1; disk /dev/vg0/xenfilesrv; meta-disk internal;

on baldur.mydomain.local { address 10.99.99.1:7788; } on thor.mydomain.local { address 10.99.99.2:7788; } } resource xenfilesrvdata { device /dev/drbd2; disk /dev/vg0/xenfilesrvdata; meta-disk internal;

on baldur.mydomain.local { address 10.99.99.1:7789; } on thor.mydomain.local { address 10.99.99.2:7789; } } _______________________________________________________________________________

xenfilesrv is a xen domU in this domU i ran a dd with oflag direct: --------------------------------------------------------- # dd if=/dev/zero of=1gig.file bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 147.997 seconds, 7.1 MB/s

Just before I ran the dd this popped up in the secondary hosts syslog: ---------------------------------------------------------------------- Jul 27 21:51:42 thor kernel: drbd2: Method to ensure write ordering: flush Jul 27 21:51:42 thor kernel: drbd1: Method to ensure write ordering: flush

_______________________________________________________________________________

What more can I try?

To be quite honest, I have no idea what to do with/ where to find the TCP_NODELAY socket options......

Kind regards, Coert

Ross Walker

10:18 p.m.

On Jul 27, 2009, at 4:09 PM, Coert Waagmeester <lgroups@waagmeester.co.za

...

wrote:

...

On Mon, 2009-07-27 at 12:37 +0200, Coert Waagmeester wrote:

...
On Mon, 2009-07-27 at 12:02 +0200, Alexander Dalloz wrote:

...
...
On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:

...
...
Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

have you considered to test the drbd-8.3 packages?

http://bugs.centos.org/view.php?id=3598

http://dev.centos.org/centos/5/testing/%7Bi386,x86_64%7D/RPMS/

Thank you very much for this tip! It was one very obvious place where I did not look yet.

Would it be necessary to still recompile it for the TCP_NODELAY and such?

I am just making sure, because http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html makes it seem unnecessary.

Why do the repositories provide both DRBD 8.0.x and 8.2.6?

Here is a status update.... _______________________________________________________________________________

...

on both hosts I now run from the testing repository: # rpm -qa | grep drbd drbd83-8.3.1-5.el5.centos kmod-drbd83-xen-8.3.1-4.el5.centos _______________________________________________________________________________

...

Here is my config (slightly condensed):

global { usage-count yes; } common { protocol C; syncer { rate 50M; } net { # allow-two-primaries; } sndbuf-size 0; } # disk {no-disk-flushes; # no-md-flushes; } startup { wfc-timeout 0 ; } } resource xenfilesrv { device /dev/drbd1; disk /dev/vg0/xenfilesrv; meta-disk internal;

on baldur.mydomain.local { address 10.99.99.1:7788; } on thor.mydomain.local { address 10.99.99.2:7788; } } resource xenfilesrvdata { device /dev/drbd2; disk /dev/vg0/xenfilesrvdata; meta-disk internal;

on baldur.mydomain.local { address 10.99.99.1:7789; } on thor.mydomain.local { address 10.99.99.2:7789; } } _______________________________________________________________________________

...

xenfilesrv is a xen domU in this domU i ran a dd with oflag direct:

# dd if=/dev/zero of=1gig.file bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 147.997 seconds, 7.1 MB/s

Just before I ran the dd this popped up in the secondary hosts syslog:

Jul 27 21:51:42 thor kernel: drbd2: Method to ensure write ordering: flush Jul 27 21:51:42 thor kernel: drbd1: Method to ensure write ordering: flush

...

What more can I try?

To be quite honest, I have no idea what to do with/ where to find the TCP_NODELAY socket options......

Use drbd option to disable flush/sync, but understand that during a power failure or system crash data will not be consistent on disk and you will need to sync the storage from the other server.

-Ross

Coert Waagmeester

28 Jul 28 Jul

9:14 a.m.

On Mon, 2009-07-27 at 18:18 -0400, Ross Walker wrote:

...

On Jul 27, 2009, at 4:09 PM, Coert Waagmeester lgroups@waagmeester.co.za wrote:

...
On Mon, 2009-07-27 at 12:37 +0200, Coert Waagmeester wrote:

...
On Mon, 2009-07-27 at 12:02 +0200, Alexander Dalloz wrote:

...
...
On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:

...
...
Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

have you considered to test the drbd-8.3 packages?

http://bugs.centos.org/view.php?id=3598

http://dev.centos.org/centos/5/testing/%7Bi386,x86_64%7D/RPMS/

Thank you very much for this tip! It was one very obvious place where I did not look yet.

Would it be necessary to still recompile it for the TCP_NODELAY and such?

I am just making sure, because http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html makes it seem unnecessary.

Why do the repositories provide both DRBD 8.0.x and 8.2.6?

Here is a status update.... _______________________________________________________________________________ on both hosts I now run from the testing repository: # rpm -qa | grep drbd drbd83-8.3.1-5.el5.centos kmod-drbd83-xen-8.3.1-4.el5.centos _______________________________________________________________________________ Here is my config (slightly condensed):

global { usage-count yes; } common { protocol C; syncer { rate 50M; } net { # allow-two-primaries; } sndbuf-size 0; } # disk {no-disk-flushes; # no-md-flushes; } startup { wfc-timeout 0 ; } } resource xenfilesrv { device /dev/drbd1; disk /dev/vg0/xenfilesrv; meta-disk internal;

on baldur.mydomain.local { address 10.99.99.1:7788; } on thor.mydomain.local { address 10.99.99.2:7788; } } resource xenfilesrvdata { device /dev/drbd2; disk /dev/vg0/xenfilesrvdata; meta-disk internal;

on baldur.mydomain.local { address 10.99.99.1:7789; } on thor.mydomain.local { address 10.99.99.2:7789; } } _______________________________________________________________________________

xenfilesrv is a xen domU in this domU i ran a dd with oflag direct:

# dd if=/dev/zero of=1gig.file bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 147.997 seconds, 7.1 MB/s

Just before I ran the dd this popped up in the secondary hosts syslog:

Jul 27 21:51:42 thor kernel: drbd2: Method to ensure write ordering: flush Jul 27 21:51:42 thor kernel: drbd1: Method to ensure write ordering: flush

What more can I try?

To be quite honest, I have no idea what to do with/ where to find the TCP_NODELAY socket options......

Use drbd option to disable flush/sync, but understand that during a power failure or system crash data will not be consistent on disk and you will need to sync the storage from the other server.

-Ross

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

That also does not really make a difference. According to DRBD everything goes into barrier mode.

I still get speed of around 7.5 MB/sec

In the config i now have this: disk { no-disk-barrier; no-disk-flushes; no-md-flushes; }

according to /proc/drbd it then goes into 'drain' mode.

I still get only 8MB/sec throughput.

Would it be unwise to consider using Protocol A?

I have just tried Protocol A, and I also only get 8 MB/sec. But, if I disconnect the secondary node, and do the dd again, I get 32MB/sec!

PS I sent another mail with an attachment. Have a feeling that is moderated though....

Roman Savelyev

29 Jul 29 Jul

7:42 a.m.

No way in 8.2 It's a socket option, managed well in 8.3 and later releases. If you don't hav large amount of very small syncronius writes, you don't need it. ----- Original Message ----- From: "Coert Waagmeester" lgroups@waagmeester.co.za To: "CentOS mailing list" centos@centos.org Sent: Monday, July 27, 2009 10:30 AM Subject: Re: [CentOS] DRBD very slow....

...

On Mon, 2009-07-27 at 10:18 +0400, Roman Savelyev wrote:

...
...
Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

It's a better way. But socket oprions in DRBD up to 8.2 (Nagel alghoritm) can decrease performance in large amount of small syncronius writes.

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

How do I disable that nagle algorithm?

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Coert Waagmeester

30 Jul 30 Jul

7:05 a.m.

New subject: [SOLVED sort of] DRBD very slow....

On Wed, 2009-07-29 at 11:42 +0400, Roman Savelyev wrote:

...

No way in 8.2 It's a socket option, managed well in 8.3 and later releases. If you don't hav large amount of very small syncronius writes, you don't need it. ----- Original Message ----- From: "Coert Waagmeester" lgroups@waagmeester.co.za To: "CentOS mailing list" centos@centos.org Sent: Monday, July 27, 2009 10:30 AM Subject: Re: [CentOS] DRBD very slow....

...
On Mon, 2009-07-27 at 10:18 +0400, Roman Savelyev wrote:

...
...
Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.

It's a better way. But socket oprions in DRBD up to 8.2 (Nagel alghoritm) can decrease performance in large amount of small syncronius writes.

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hello Roman,

I am running drbd 8.2.6 (the standard centos version)

How do I disable that nagle algorithm?

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hi all,

Just want to thank you for all your help on this so far. We are now using that server for something else, so at the moment my DRBD plans are on hold. After playing around with the snd and receive and max buffer settings, I did manage to crank the speed up to 10MB/sec.

Thanks again for all your help, Coert

Ian Forde

23 Jul 23 Jul

1:16 a.m.

On Wed, 2009-07-22 at 11:16 +0200, Coert Waagmeester wrote:

...

The highest speed I can get through that link with drbd is 11 MB/sec (megabytes)

Not good...

...

But if I copy a 1 gig file over that link I get 110 MB/sec.

That tells me that the network connection is fine. The issue is at a higher layer...

...

Why is DRBD so slow?

Let's see...

...

common { protocol C; syncer { rate 80M; } net { allow-two-primaries; } }

You want allow-two-primaries? That implies that you're using something like ocfs2, but that's probably immaterial to the discussion... Here's a question - do you have another syncer statement in the resource definition that's set to a lower number? That would definitely throttle the sync rate...

-I

Coert Waagmeester

7:51 a.m.

On Wed, 2009-07-22 at 18:16 -0700, Ian Forde wrote:

...

On Wed, 2009-07-22 at 11:16 +0200, Coert Waagmeester wrote:

...
The highest speed I can get through that link with drbd is 11 MB/sec (megabytes)

Not good...

...
But if I copy a 1 gig file over that link I get 110 MB/sec.

That tells me that the network connection is fine. The issue is at a higher layer...

...
Why is DRBD so slow?

Let's see...

...
common { protocol C; syncer { rate 80M; } net { allow-two-primaries; } }

You want allow-two-primaries? That implies that you're using something like ocfs2, but that's probably immaterial to the discussion... Here's a question - do you have another syncer statement in the resource definition that's set to a lower number? That would definitely throttle the sync rate...

-I

I occasionally do migration from one dom0 to the other

I do not have clustered file sytems, so I make sure that two are only primary during the migration.

I have no automation yet, I do it all manually to be sure.

I only have one syncer defenition, and according to the drbd manual that is the rate for full resyncs?

...

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

5828

Age (days ago)

5836

Last active (days ago)

discuss@lists.centos.org

23 comments

6 participants

tags (0)

participants (6)

Alexander Dalloz
Coert Waagmeester
Ian Forde
John R Pierce
Roman Savelyev
Ross Walker