[CentOS] DRBD very slow....

Fri Jul 24 13:27:11 UTC 2009
Ross Walker <rswwalker at gmail.com>

On Jul 24, 2009, at 3:28 AM, Coert Waagmeester <lgroups at waagmeester.co.za 
 > wrote:

>
> On Fri, 2009-07-24 at 10:21 +0400, Roman Savelyev wrote:
>> 1. You are hit by Nagel alghoritm (slow TCP response). You can  
>> build DRBD
>> 8.3. In 8.3 "TCP_NODELAY" and "QUICK_RESPONSE" implemented in place.
>> 2. You are hit by DRBD protocol. In most cases, "B" is enought.
>> 3. You are hit by triple barriers. In most cases you are need only  
>> one of
>> "barrier, flush,  drain" - see documentation, it depens on type of  
>> storage
>> hardware.
>>
>
> I have googled the triple barriers thing but cant find that much
> information.
>
> Would it help if I used IPv6 instead of IPv4?

Triple barriers wouldn't affect you as this is on top of LVM and LVM  
doesn't support barriers, so it acts like a filter for them. Not good,  
but that's the state of things.

I would have run the dd tests locally and not with netcat, the idea is  
to take the network out of the picture.

Given the tests though it looks like the disks have their write caches  
disabled which cripples them, but with LVM filtering barriers, it's  
the safest configuration.

The way to get fast and safe is to use partitions instead of logical  
volumes. If you need more then 4 then use GPT partition table which  
allows up to 256 I believe. Then you can enable the disk caches as  
drbd will issue barrier writes to assure consistency (hmmm maybe the  
barrier problem is with devmapper which means software RAID will be a  
problem too? Need to check that).

Or

Invest in a HW RAID card with NVRAM cache that will negate the need  
for barrier writes from the OS as the controller will issue them async  
from cache allowing I/O to continue flowing. This really is the safest  
method.

-Ross