Hi guys.
Hoping some net experts my stumble upon this message, I have an IPoIB direct host to host connection and:
-> $ ethtool ib1 Settings for ib1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 40000Mb/s Duplex: Full Auto-negotiation: on Port: Other PHYAD: 255 Transceiver: internal Link detected: yes
and that's both ends, both hosts, yet:
$ iperf3 -c 10.5.5.97
Connecting to host 10.5.5.97, port 5201 [ 5] local 10.5.5.49 port 56874 connected to 10.5.5.97 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.36 GBytes 11.6 Gbits/sec 0 2.50 MBytes [ 5] 1.00-2.00 sec 1.87 GBytes 16.0 Gbits/sec 0 2.50 MBytes [ 5] 2.00-3.00 sec 1.84 GBytes 15.8 Gbits/sec 0 2.50 MBytes [ 5] 3.00-4.00 sec 1.83 GBytes 15.7 Gbits/sec 0 2.50 MBytes [ 5] 4.00-5.00 sec 1.61 GBytes 13.9 Gbits/sec 0 2.50 MBytes [ 5] 5.00-6.00 sec 1.60 GBytes 13.8 Gbits/sec 0 2.50 MBytes [ 5] 6.00-7.00 sec 1.56 GBytes 13.4 Gbits/sec 0 2.50 MBytes [ 5] 7.00-8.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 8.00-9.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 9.00-10.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec receiver
It's rather an oldish platform which hosts the link, PCIe is only 2.0 but with link of x8 that should be able to carry more than ~13Gbits/sec. Infiniband is Mellanox's ConnectX-3.
Any thoughts on how to track the bottleneck or any thoughts I'll appreciate much. thanks, L
On Thu, Jan 21, 2021 at 6:34 PM lejeczek via CentOS centos@centos.org wrote:
Hi guys.
Hoping some net experts my stumble upon this message, I have an IPoIB direct host to host connection and:
-> $ ethtool ib1 Settings for ib1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 40000Mb/s Duplex: Full Auto-negotiation: on Port: Other PHYAD: 255 Transceiver: internal Link detected: yes
and that's both ends, both hosts, yet:
$ iperf3 -c 10.5.5.97
Connecting to host 10.5.5.97, port 5201 [ 5] local 10.5.5.49 port 56874 connected to 10.5.5.97 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.36 GBytes 11.6 Gbits/sec 0 2.50 MBytes [ 5] 1.00-2.00 sec 1.87 GBytes 16.0 Gbits/sec 0 2.50 MBytes [ 5] 2.00-3.00 sec 1.84 GBytes 15.8 Gbits/sec 0 2.50 MBytes [ 5] 3.00-4.00 sec 1.83 GBytes 15.7 Gbits/sec 0 2.50 MBytes [ 5] 4.00-5.00 sec 1.61 GBytes 13.9 Gbits/sec 0 2.50 MBytes [ 5] 5.00-6.00 sec 1.60 GBytes 13.8 Gbits/sec 0 2.50 MBytes [ 5] 6.00-7.00 sec 1.56 GBytes 13.4 Gbits/sec 0 2.50 MBytes [ 5] 7.00-8.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 8.00-9.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 9.00-10.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes
[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec receiver
It's rather an oldish platform which hosts the link, PCIe is only 2.0 but with link of x8 that should be able to carry more than ~13Gbits/sec. Infiniband is Mellanox's ConnectX-3.
Any thoughts on how to track the bottleneck or any thoughts
Care to capture (a few seconds) of the *sender* side .pcap? Often TCP receive window is too small or packet loss is to blame or round-trip-time. All of these would be evident in the packet capture.
If you do multiple streams with the `-P 8` flag does that increase the throughput?
Google says these endpoints are 1.5ms apart:
(2.5 megabytes) / (13 Gbps) = 1.53846154 milliseconds
On 22/01/2021 00:33, Steven Tardy wrote:
On Thu, Jan 21, 2021 at 6:34 PM lejeczek via CentOS <centos@centos.org mailto:centos@centos.org> wrote:
Hi guys. Hoping some net experts my stumble upon this message, I have an IPoIB direct host to host connection and: -> $ ethtool ib1 Settings for ib1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 40000Mb/s Duplex: Full Auto-negotiation: on Port: Other PHYAD: 255 Transceiver: internal Link detected: yes and that's both ends, both hosts, yet: > $ iperf3 -c 10.5.5.97 Connecting to host 10.5.5.97, port 5201 [ 5] local 10.5.5.49 port 56874 connected to 10.5.5.97 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.36 GBytes 11.6 Gbits/sec 0 2.50 MBytes [ 5] 1.00-2.00 sec 1.87 GBytes 16.0 Gbits/sec 0 2.50 MBytes [ 5] 2.00-3.00 sec 1.84 GBytes 15.8 Gbits/sec 0 2.50 MBytes [ 5] 3.00-4.00 sec 1.83 GBytes 15.7 Gbits/sec 0 2.50 MBytes [ 5] 4.00-5.00 sec 1.61 GBytes 13.9 Gbits/sec 0 2.50 MBytes [ 5] 5.00-6.00 sec 1.60 GBytes 13.8 Gbits/sec 0 2.50 MBytes [ 5] 6.00-7.00 sec 1.56 GBytes 13.4 Gbits/sec 0 2.50 MBytes [ 5] 7.00-8.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 8.00-9.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 9.00-10.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec receiver It's rather an oldish platform which hosts the link, PCIe is only 2.0 but with link of x8 that should be able to carry more than ~13Gbits/sec. Infiniband is Mellanox's ConnectX-3. Any thoughts on how to track the bottleneck or any thoughts
Care to capture (a few seconds) of the *sender* side .pcap? Often TCP receive window is too small or packet loss is to blame or round-trip-time. All of these would be evident in the packet capture.
If you do multiple streams with the `-P 8` flag does that increase the throughput?
Google says these endpoints are 1.5ms apart:
(2.5 megabytes) / (13 Gbps) = 1.53846154 milliseconds
Seems that the platform in overall might not be enough. That bitrate goes down even further when CPUs are fully loaded & occupied. (I'll try to keep on investigating)
What I'm trying next is to have both ports(a dual-port card) "teamed" by NM, with runner set to broadcast. I'm leaving out "p-key" which NM sets to "default"(which is working with a "regular" IPoIP connection) RHEL's "networking guide" docs say "...create a team from two or more Wired or InfiniBand connections..." When I try to stand up such a team, master starts but slaves, both, fail with: "... <info> [1611588576.8887] device (ib1): Activation: starting connection 'team1055-slave-ib1' (900d5073-366c-4a40-8c32-ac42c76f9c2e) <info> [1611588576.8889] device (ib1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') <info> [1611588576.8973] device (ib1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') <info> [1611588576.9199] device (ib1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') <warn> [1611588576.9262] device (ib1): Activation: connection 'team1055-slave-ib1' could not be enslaved <info> [1611588576.9272] device (ib1): state change: ip-config -> failed (reason 'unknown', sys-iface-state: 'managed') <info> [1611588576.9280] device (ib1): released from master device nm-team <info> [1611589045.6268] device (ib1): carrier: link connected ..."
Any suggestions also appreciated. thanks, L
I have never played with Infinibad, but I think that those cards most probably allow some checksum offloading capabilities. Have you explored in that direction and test with checksum in ofloaded mode ?
Best Regards, Strahil Nikolov
В 15:49 +0000 на 25.01.2021 (пн), lejeczek via CentOS написа:
On 22/01/2021 00:33, Steven Tardy wrote:
On Thu, Jan 21, 2021 at 6:34 PM lejeczek via CentOS <centos@centos.org mailto:centos@centos.org> wrote:
Hi guys. Hoping some net experts my stumble upon this message, I have an IPoIB direct host to host connection and: -> $ ethtool ib1 Settings for ib1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 40000Mb/s Duplex: Full Auto-negotiation: on Port: Other PHYAD: 255 Transceiver: internal Link detected: yes and that's both ends, both hosts, yet: > $ iperf3 -c 10.5.5.97 Connecting to host 10.5.5.97, port 5201 [ 5] local 10.5.5.49 port 56874 connected to 10.5.5.97 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.36 GBytes 11.6 Gbits/sec 0 2.50 MBytes [ 5] 1.00-2.00 sec 1.87 GBytes 16.0 Gbits/sec 0 2.50 MBytes [ 5] 2.00-3.00 sec 1.84 GBytes 15.8 Gbits/sec 0 2.50 MBytes [ 5] 3.00-4.00 sec 1.83 GBytes 15.7 Gbits/sec 0 2.50 MBytes [ 5] 4.00-5.00 sec 1.61 GBytes 13.9 Gbits/sec 0 2.50 MBytes [ 5] 5.00-6.00 sec 1.60 GBytes 13.8 Gbits/sec 0 2.50 MBytes [ 5] 6.00-7.00 sec 1.56 GBytes 13.4 Gbits/sec 0 2.50 MBytes [ 5] 7.00-8.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 8.00-9.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes [ 5] 9.00-10.00 sec 1.52 GBytes 13.1 Gbits/sec 0 2.50 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec receiver It's rather an oldish platform which hosts the link, PCIe is only 2.0 but with link of x8 that should be able to carry more than ~13Gbits/sec. Infiniband is Mellanox's ConnectX-3. Any thoughts on how to track the bottleneck or any thoughts
Care to capture (a few seconds) of the *sender* side .pcap? Often TCP receive window is too small or packet loss is to blame or round-trip-time. All of these would be evident in the packet capture.
If you do multiple streams with the `-P 8` flag does that increase the throughput?
Google says these endpoints are 1.5ms apart:
(2.5 megabytes) / (13 Gbps) = 1.53846154 milliseconds
Seems that the platform in overall might not be enough. That bitrate goes down even further when CPUs are fully loaded & occupied. (I'll try to keep on investigating)
What I'm trying next is to have both ports(a dual-port card) "teamed" by NM, with runner set to broadcast. I'm leaving out "p-key" which NM sets to "default"(which is working with a "regular" IPoIP connection) RHEL's "networking guide" docs say "...create a team from two or more Wired or InfiniBand connections..." When I try to stand up such a team, master starts but slaves, both, fail with: "... <info> [1611588576.8887] device (ib1): Activation: starting connection 'team1055-slave-ib1' (900d5073-366c-4a40-8c32-ac42c76f9c2e) <info> [1611588576.8889] device (ib1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') <info> [1611588576.8973] device (ib1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') <info> [1611588576.9199] device (ib1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') <warn> [1611588576.9262] device (ib1): Activation: connection 'team1055-slave-ib1' could not be enslaved <info> [1611588576.9272] device (ib1): state change: ip-config -> failed (reason 'unknown', sys-iface-state: 'managed') <info> [1611588576.9280] device (ib1): released from master device nm-team <info> [1611589045.6268] device (ib1): carrier: link connected ..."
Any suggestions also appreciated. thanks, L _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On Thu, 21 Jan 2021 23:33:56 +0000 lejeczek via CentOS centos@centos.org wrote:
Hi guys.
Hoping some net experts my stumble upon this message, I have an IPoIB direct host to host connection and:
...
$ iperf3 -c 10.5.5.97
[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec receiver
It's rather an oldish platform which hosts the link, PCIe is only 2.0 but with link of x8 that should be able to carry more than ~13Gbits/sec. Infiniband is Mellanox's ConnectX-3.
If you want to test the infiniband performance you ib_write_bw for example not iperf.
IPoIB will always be quite a bit slower than native IB.
That said, if you want to optimize IPoIB for performance make sure you're running connected mode not datagram mode.
/Peter