I have a Dell server with four bonded, gigabit interfaces. Bonding mode is 802.3ad, xmit_hash_policy=layer3+4. When testing this setup with iperf, I never get more than a total of about 3Gbps throughput. Is there anything to tweak to get better throughput? Or am I running into other limits (e.g. was reading about tcp retransmit limits for mode 0).
The iperf test was run with iperf -s on the server, and iperf -c server on four clients connected to the same switch.
------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 10.10.15.193 port 5001 connected with 10.10.15.184 port 48588 [ 5] local 10.10.15.193 port 5001 connected with 10.10.15.187 port 49231 [ 6] local 10.10.15.193 port 5001 connected with 10.10.15.188 port 53197 [ 7] local 10.10.15.193 port 5001 connected with 10.10.15.189 port 55309 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec [ 5] 0.0-10.0 sec 872 MBytes 728 Mbits/sec [ 6] 0.0-10.0 sec 318 MBytes 267 Mbits/sec [ 7] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec [ 8] local 10.10.15.193 port 5001 connected with 10.10.15.184 port 48589 [ 4] local 10.10.15.193 port 5001 connected with 10.10.15.187 port 49234 [ 5] local 10.10.15.193 port 5001 connected with 10.10.15.188 port 38864 [ 6] local 10.10.15.193 port 5001 connected with 10.10.15.189 port 55311 [ 8] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec [ 4] 0.0-10.0 sec 862 MBytes 721 Mbits/sec [ 5] 0.0-10.0 sec 322 MBytes 270 Mbits/sec [ 6] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec
--------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. ---------------------------------------------------------------
lhecking@users.sourceforge.net wrote:
I have a Dell server with four bonded, gigabit interfaces. Bonding mode is 802.3ad, xmit_hash_policy=layer3+4. When testing this setup with iperf, I never get more than a total of about 3Gbps throughput. Is there anything to tweak to get better throughput? Or am I running into other limits (e.g. was reading about tcp retransmit limits for mode 0).
The iperf test was run with iperf -s on the server, and iperf -c server on four clients connected to the same switch.
Server listening on TCP port 5001 TCP window size: 85.3 KByte (default)
[ 4] local 10.10.15.193 port 5001 connected with 10.10.15.184 port 48588 [ 5] local 10.10.15.193 port 5001 connected with 10.10.15.187 port 49231 [ 6] local 10.10.15.193 port 5001 connected with 10.10.15.188 port 53197 [ 7] local 10.10.15.193 port 5001 connected with 10.10.15.189 port 55309 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec [ 5] 0.0-10.0 sec 872 MBytes 728 Mbits/sec [ 6] 0.0-10.0 sec 318 MBytes 267 Mbits/sec [ 7] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec [ 8] local 10.10.15.193 port 5001 connected with 10.10.15.184 port 48589 [ 4] local 10.10.15.193 port 5001 connected with 10.10.15.187 port 49234 [ 5] local 10.10.15.193 port 5001 connected with 10.10.15.188 port 38864 [ 6] local 10.10.15.193 port 5001 connected with 10.10.15.189 port 55311 [ 8] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec [ 4] 0.0-10.0 sec 862 MBytes 721 Mbits/sec [ 5] 0.0-10.0 sec 322 MBytes 270 Mbits/sec [ 6] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec
Could it be that with the combinations of IP addresses and port numbers used by iperf above, it ends up with only 3 of the links being used?
According to the Linux bonding docs, xmit_hash_policy=layer3+4 uses:
((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count
So I guess you could plug in in the above IP addresses and port numbers and see if you get 3 unique results?
James Pearson
According to the Linux bonding docs, xmit_hash_policy=layer3+4 uses:
((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count
So I guess you could plug in in the above IP addresses and port numbers and see if you get 3 unique results?
That's an interesting point - how exactly do you XOR these numbers? Bit by bit? And does 0xffff really mean 0xffffffff in this context?
--------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. ---------------------------------------------------------------
lhecking@users.sourceforge.net wrote:
According to the Linux bonding docs, xmit_hash_policy=layer3+4 uses:
((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo slave count
So I guess you could plug in in the above IP addresses and port numbers and see if you get 3 unique results?
That's an interesting point - how exactly do you XOR these numbers? Bit by bit? And does 0xffff really mean 0xffffffff in this context?
I guess you need to look at the bonding src code - looks like it is in drivers/net/bonding/bond_main.c - for CentOS 5 it is:
/* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is * altogether not IP, mimic bond_xmit_hash_policy_l2() */ static int bond_xmit_hash_policy_l34(struct sk_buff *skb, struct net_device *bond_dev, int count) { struct ethhdr *data = (struct ethhdr *)skb->data; struct iphdr *iph = ip_hdr(skb); __be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl); int layer4_xor = 0;
if (skb->protocol == htons(ETH_P_IP)) { if (!(iph->frag_off & htons(IP_MF|IP_OFFSET)) && (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP)) { layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1))); } return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
}
return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count; }
James Pearson
I guess you need to look at the bonding src code - looks like it is in drivers/net/bonding/bond_main.c - for CentOS 5 it is:
C xor is bitwise.
I did a bit of scripting and found that the algorithm seems much more sensitive to port numbers than IP addresses. Not that iperf gives much control over those, it looks like the client port numbers are picked at random. As a result, I would expect to repeat this test on the same set of clients, say, hundreds of times, and maybe find a small number of cases where all interfaces are utilised.
--------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. ---------------------------------------------------------------
On 1/11/2011 10:05 AM, lhecking@users.sourceforge.net wrote:
I guess you need to look at the bonding src code - looks like it is in drivers/net/bonding/bond_main.c - for CentOS 5 it is:
C xor is bitwise.
I did a bit of scripting and found that the algorithm seems much more sensitive to port numbers than IP addresses. Not that iperf gives much control over those, it looks like the client port numbers are picked at random. As a result, I would expect to repeat this test on the same set of clients, say, hundreds of times, and maybe find a small number of cases where all interfaces are utilised.
Hashing 4 values to 4 targets seems like collisions would be likely no matter how you do it. The TX packet/byte values from ifconfig on the NICs should show how much went out each interface.
Hashing 4 values to 4 targets seems like collisions would be likely no matter how you do it. The TX packet/byte values from ifconfig on the NICs should show how much went out each interface.
Yes, we checked that in addition to iperf's output. One interface was essentially idle.
--------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. ---------------------------------------------------------------
On 1/11/2011 10:18 AM, lhecking@users.sourceforge.net wrote:
Hashing 4 values to 4 targets seems like collisions would be likely no matter how you do it. The TX packet/byte values from ifconfig on the NICs should show how much went out each interface.
Yes, we checked that in addition to iperf's output. One interface was essentially idle.
Where will the real-world load go? If you will have a large number of connections, the statistics should work out better.
lhecking@users.sourceforge.net wrote:
I guess you need to look at the bonding src code - looks like it is in drivers/net/bonding/bond_main.c - for CentOS 5 it is:
C xor is bitwise.
I did a bit of scripting and found that the algorithm seems much more sensitive to port numbers than IP addresses. Not that iperf gives much control over those, it looks like the client port numbers are picked at random. As a result, I would expect to repeat this test on the same set of clients, say, hundreds of times, and maybe find a small number of cases where all interfaces are utilised.
You could use xmit_hash_policy=layer2+3 - which just uses MAC and IP addresses (which you do have more control over) - and see if you can pick a mix of IP/MAC addresses that would result in all four interfaces being used (theoretically) - and see if it matches reality?
James Pearson
On Tue, 11 Jan 2011, James Pearson wrote:
You could use xmit_hash_policy=layer2+3 - which just uses MAC and IP addresses (which you do have more control over) - and see if you can pick a mix of IP/MAC addresses that would result in all four interfaces being used (theoretically) - and see if it matches reality?
I semi-apologize for this question, but I don't have a suitably equipped machine-switch combination free for testing, so I can't do empirical testing.
I don't see the layer2+3 option listed in the CentOS 5.5 bonding module:
# modinfo bonding | egrep 'version|xmit' version: 3.4.0 srcversion: 0B48FBAC9285804638A6BE7 parm: xmit_hash_policy:XOR hashing method: 0 for layer 2 (default), 1 for layer 3+4 (charp)
Is the layer2+3 option simply undocumented via modinfo or is it not present in the CentOS bonding module?
Is the layer2+3 option simply undocumented via modinfo or is it not present in the CentOS bonding module?
It's docuemnted in the kernel docs rpm, /usr/share/doc/kernel-doc-2.6.18/Documentation/networking/bonding.txt.
--------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. ---------------------------------------------------------------