On Fri, 2009-07-24 at 09:27 -0400, Ross Walker wrote:
On Jul 24, 2009, at 3:28 AM, Coert Waagmeester <lgroups@waagmeester.co.za
wrote:
On Fri, 2009-07-24 at 10:21 +0400, Roman Savelyev wrote:
- You are hit by Nagel alghoritm (slow TCP response). You can
build DRBD 8.3. In 8.3 "TCP_NODELAY" and "QUICK_RESPONSE" implemented in place. 2. You are hit by DRBD protocol. In most cases, "B" is enought. 3. You are hit by triple barriers. In most cases you are need only one of "barrier, flush, drain" - see documentation, it depens on type of storage hardware.
I have googled the triple barriers thing but cant find that much information.
Would it help if I used IPv6 instead of IPv4?
Triple barriers wouldn't affect you as this is on top of LVM and LVM doesn't support barriers, so it acts like a filter for them. Not good, but that's the state of things.
I would have run the dd tests locally and not with netcat, the idea is to take the network out of the picture.
I have run the dd again locally.
It writes to an LVM volume on top of Software RAID 1 mounted in dom0: # dd if=/dev/zero of=/mnt/data/1gig.file oflag=direct bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 24.3603 seconds, 43.0 MB/s
Given the tests though it looks like the disks have their write caches disabled which cripples them, but with LVM filtering barriers, it's the safest configuration.
The way to get fast and safe is to use partitions instead of logical volumes. If you need more then 4 then use GPT partition table which allows up to 256 I believe. Then you can enable the disk caches as drbd will issue barrier writes to assure consistency (hmmm maybe the barrier problem is with devmapper which means software RAID will be a problem too? Need to check that).
I am reading up on GPT, and that seems like a viable option. Will keep you posted.
Most googles point to software raid 1 supporting barriers. not too sure though.
Or
Invest in a HW RAID card with NVRAM cache that will negate the need for barrier writes from the OS as the controller will issue them async from cache allowing I/O to continue flowing. This really is the safest method.
This is not going to be easy.... The servers we use are 1U rackmount, and the single available PCI-express port is used up on both servers by a quad gigabit network card.
-Ross
Thanks for all the valuable tips so far, I will keep you posted.