[CentOS] DRBD very slow....

Tue Jul 28 09:14:32 UTC 2009
Coert Waagmeester <lgroups at waagmeester.co.za>

On Mon, 2009-07-27 at 18:18 -0400, Ross Walker wrote:
> On Jul 27, 2009, at 4:09 PM, Coert Waagmeester
> <lgroups at waagmeester.co.za> wrote:
> 
> 
> 
> 
> > 
> > On Mon, 2009-07-27 at 12:37 +0200, Coert Waagmeester wrote:
> > > On Mon, 2009-07-27 at 12:02 +0200, Alexander Dalloz wrote:
> > > > > 
> > > > > On Mon, 2009-07-27 at 08:30 +0200, Coert Waagmeester wrote:
> > > > 
> > > > > > Hello Roman,
> > > > > > 
> > > > > > I am running drbd 8.2.6 (the standard centos version)
> > > > 
> > > > have you considered to test the drbd-8.3 packages?
> > > > 
> > > > http://bugs.centos.org/view.php?id=3598
> > > > 
> > > > http://dev.centos.org/centos/5/testing/{i386,x86_64}/RPMS/
> > > > 
> > > 
> > > Thank you very much for this tip! It was one very obvious place
> > > where I
> > > did not look yet.
> > > 
> > > 
> > > Would it be necessary to still recompile it for the TCP_NODELAY
> > > and
> > > such?
> > > 
> > > I am just making sure, because
> > > http://www.nabble.com/Huge-latency-issue-with-8.2.6-td18947965.html
> > > makes it seem unnecessary.
> > > 
> > > Why do the repositories provide both DRBD 8.0.x and 8.2.6?
> > > 
> > 
> > Here is a status update....
> > _______________________________________________________________________________
> > on both hosts I now run from the testing repository:
> > # rpm -qa | grep drbd
> > drbd83-8.3.1-5.el5.centos
> > kmod-drbd83-xen-8.3.1-4.el5.centos
> > _______________________________________________________________________________
> > Here is my config (slightly condensed):
> > -----------------------------------------------------
> > global {
> >  usage-count yes;
> > }
> > common {
> >  protocol C;
> >  syncer { rate 50M; }
> >  net {
> > #        allow-two-primaries; }
> >         sndbuf-size 0; }
> > #  disk {no-disk-flushes;
> > #        no-md-flushes; }
> >  startup { wfc-timeout 0 ; }
> > }
> > resource xenfilesrv {
> >  device    /dev/drbd1;
> >  disk      /dev/vg0/xenfilesrv;
> >  meta-disk internal;
> > 
> >  on baldur.mydomain.local {
> >    address   10.99.99.1:7788;
> >  }
> >  on thor.mydomain.local {
> >    address   10.99.99.2:7788;
> >  }
> > }
> > resource xenfilesrvdata {
> >  device    /dev/drbd2;
> >  disk      /dev/vg0/xenfilesrvdata;
> >  meta-disk internal;
> > 
> >  on baldur.mydomain.local {
> >    address   10.99.99.1:7789;
> >  }
> >  on thor.mydomain.local {
> >    address   10.99.99.2:7789;
> >  }
> > }
> > _______________________________________________________________________________
> > 
> > xenfilesrv is a xen domU
> > in this domU i ran a dd with oflag direct:
> > ---------------------------------------------------------
> > # dd if=/dev/zero of=1gig.file bs=1M count=1000 oflag=direct
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes (1.0 GB) copied, 147.997 seconds, 7.1 MB/s
> > 
> > Just before I ran the dd this popped up in the secondary hosts
> > syslog:
> > ----------------------------------------------------------------------
> > Jul 27 21:51:42 thor kernel: drbd2: Method to ensure write ordering:
> > flush
> > Jul 27 21:51:42 thor kernel: drbd1: Method to ensure write ordering:
> > flush
> > 
> > 
> > _______________________________________________________________________________
> > 
> > What more can I try?
> > 
> > To be quite honest, I have no idea what to do with/ where to find
> > the
> > TCP_NODELAY socket options......
> > 
> 
> 
> Use drbd option to disable flush/sync, but understand that during a
> power failure or system crash data will not be consistent on disk and
> you will need to sync the storage from the other server.
> 
> 
> -Ross
> 
> 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

That also does not really make a difference.
According to DRBD everything goes into barrier mode.

I still get speed of around 7.5 MB/sec

In the config i now have this:  
disk { no-disk-barrier;
       no-disk-flushes;
       no-md-flushes; }

according to /proc/drbd it then goes into 'drain' mode.

I still get only 8MB/sec throughput.

Would it be unwise to consider using Protocol A?

I have just tried Protocol A, and I also only get 8 MB/sec.
But, if I disconnect the secondary node, and do the dd again, I get
32MB/sec!


PS I sent another mail with an attachment. Have a feeling that is
moderated though....