Multipath TCP setup in RHEL 8
Many years ago, Hercules was traveling back to Athens when he lost his way in a wood. Arriving at a crossroad, he found two women. One of them was Aletheia, the goddess of truth, and the other was Apate, the goddess of lies. But he didn’t know who was who. Hercules needed all his intelligence to find the only question capable of revealing the way to Athens.
If you are lost in a computer network and don’t know which path to take, don’t worry—you don’t need Hercules to find your way: You can use Multipath TCP.
Multipath TCP in Red Hat Enterprise Linux 8
Multipath TCP (MPTCP) is an extension to the Transmission Control Protocol (TCP) to use multiple paths between peers simultaneously. Linux’s MPTCP implementation is fairly recent and was included in version 5.6. Red Hat Enterprise Linux includes MPTCP starting with version 8.3.
In the first part of this series, Davide Caratti showed how to enable MPTCP, open an MPTPC socket in an application, and verify that MPTCP is working as expected. You can follow his tutorial to start experimenting with MPTCP.
In this article, you will learn how to:
Use iproute2 to add multiple subflows to an MPTCP connection. Verify that MPTCP uses multiple subflows. Opening an MPTCP socket
To see MPTCP in action, you need to open an MPTCP socket from a user-space application. Let’s follow the instructions from the first article of this series to set up a multiple subflow testbed.
First of all, because MPTCP is disabled in the default Red Hat Enterprise Linux configuration, you need to enable it using sysctl so you can create MPTCP sockets:
net.mptcp.enabled = 1 Copy snippet MPTCP sockets are the same as the regular TCP sockets and use the same semantics. An application can natively add support to MPTCP using a socket with IPPROTO_MPTCP like this:
fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); Copy snippet Implementing a full-fledged MPTCP application is not difficult, but what if you want to use a normal user-space application without any knowledge of IPPROTO_MPTCP?
Fear not: There are multiple paths to avoid patching and rebuilding all networking applications. The easiest path is probably to use systemtap to replace IPPROTO_TCP with IPPROTO_MPTCP in all the calls to __sys_socket() in the kernel. To make the replacement, you need to install a few packages:
$ dnf -y install
systemtap-devel Copy snippet Now you can download the systemtap script from this guide and use the following command to start it:
Copy snippet Verify that the systemtap script is working by looking at the output from dmesg:
… [1114254.601040] Kprobes globally unoptimized [1114254.611265] stap_d60b4bc4e0332aa247ebd9b2ffa597_213373: module_layout: kernel tainted. [1114254.620296] Disabling lock debugging due to kernel taint [1114254.626423] stap_d60b4bc4e0332aa247ebd9b2ffa597_213373: loading out-of-tree module taints kernel. [1114254.636597] stap_d60b4bc4e0332aa247ebd9b2ffa597_213373: module verification failed: signature and/or required key missing - tainting kernel [1114254.680368] stap_d60b4bc4e0332aa247ebd9b2ffa597_213373 (mptcp.stap): systemtap: 4.4/0.182, base: ffffffffc1082000, memory: 224data/32text/15ctx/24678net/202alloc kb, probes: 1 Copy snippet Setting up the testbed
As a simple test setup, we will use two network namespaces, mptcp-client and mptcp-server, to simulate the client and the server in an MPTCP connection. The two namespaces will be connected by two different virtual Ethernet (veth) paths: From 10.0.0.1 to 10.0.0.2 and from 192.168.0.1 to 192.168.0.2 (Figure 1).
Network topology for MPTCP two-stream test. In this setup, 10.0.0.1 on the server communicates with 10.0.0.2 on the client, and 192.168.0.1 on the server communicates with 192.168.0.2 on the client. Figure 1: Network topology for two-path MPTCP test. You can use the following script to set up the testbed:
ip netns add mptcp-client ip netns add mptcp-server
sysctl -w net.ipv4.conf.all.rp_filter=0 ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1 ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
ip link add red-client netns mptcp-client type veth peer red-server netns mptcp-server ip link add blue-client netns mptcp-client type veth peer blue-server netns mptcp-server
ip -n mptcp-server address add 10.0.0.1/24 dev red-server ip -n mptcp-server address add 192.168.0.1/24 dev blue-server ip -n mptcp-client address add 10.0.0.2/24 dev red-client ip -n mptcp-client address add 192.168.0.2/24 dev blue-client
ip -n mptcp-server link set red-server up ip -n mptcp-server link set blue-server up ip -n mptcp-client link set red-client up ip -n mptcp-client link set blue-client up Copy snippet You can verify that your setup is working as expected by following the instructions in the first part of this series.
Working with multiple paths
Now that you have MPTCP working on a single path, it’s time to take multiple paths.
First, instruct the kernel to set up multiple MPTCP subflows. iproute2 provides a handy mptcp command that can help you:
Copy snippet These commands configure the MPTCP server to accept up to two different subflows and then add a second subflow to the client. To verify that everything is configured as expected, use iproute2 again:
add_addr_accepted 2 subflows 2
add_addr_accepted 2 subflows 2
192.168.0.2 id 1 subflow dev blue-client
Copy snippet Now you are ready to use subflows. To test them, you can use ncat. The following command starts an ncat server instance on the mptcp-server:
Next, the following command sends a “hello world!” message to the server from the mptcp-client namespace:
$ ip netns exec mptcp-client ncat -c “echo hello world!” 10.0.0.1 4321 Copy snippet Using tcpdump, you can verify that there are two different three-way handshakes from different interfaces:
reading from file /tmp/mptcp.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump 1 IP 10.0.0.2.43474 > 10.0.0.1.4321: Flags [S], seq 908898843, win 29200, options [mss 1460,sackOK,TS val 3701631927 ecr 0,nop,wscale 7,mptcp capable[bad opt]> 2 IP 10.0.0.1.4321 > 10.0.0.2.43474: Flags [S.], seq 3314791626, ack 908898844, win 28960, options [mss 1460,sackOK,TS val 3198006599 ecr 3701631927,nop,wscale 7,mptcp capable Unknown Version (1)], length 0 3 IP 10.0.0.2.43474 > 10.0.0.1.4321: Flags [.], ack 1, win 229, options [nop,nop,TS val 3701631927 ecr 3198006599,mptcp capable Unknown Version (1)], length 0 4 IP 10.0.0.2.43474 > 10.0.0.1.4321: Flags [P.], seq 1:14, ack 1, win 229, options [nop,nop,TS val 3701631928 ecr 3198006599,mptcp capable[bad opt]> 5 IP 10.0.0.1.4321 > 10.0.0.2.43474: Flags [.], ack 14, win 227, options [nop,nop,TS val 3198006600 ecr 3701631928,mptcp dss ack 3158259848540329265], length 0 6 IP 192.168.0.2.36423 > 10.0.0.1.4321: Flags [S], seq 2791202022, win 29200, options [mss 1460,sackOK,TS val 1604001975 ecr 0,nop,wscale 7,mptcp join id 1 token 0xc0715389 nonce 0xcae83dcb], length 0 7 IP 10.0.0.1.4321 > 192.168.0.2.36423: Flags [S.], seq 637604674, ack 2791202023, win 28960, options [mss 1460,sackOK,TS val 511057212 ecr 1604001975,nop,wscale 7,mptcp join id 0 hmac 0x465e4bf08492bb0c nonce 0x47d18eca], length 0 8 IP 10.0.0.2.43474 > 10.0.0.1.4321: Flags [.], ack 1, win 229, options [nop,nop,TS val 3701631928 ecr 3198006600,mptcp dss fin ack 641236127 seq 3158259848540329265 subseq 0 len 1,nop,nop], length 0 9 IP 10.0.0.1.4321 > 10.0.0.2.43474: Flags [.], ack 14, win 227, options [nop,nop,TS val 3198006600 ecr 3701631928,mptcp dss ack 3158259848540329266], length 0 10 IP 10.0.0.2.43474 > 10.0.0.1.4321: Flags [F.], seq 14, ack 1, win 229, options [nop,nop,TS val 3701631928 ecr 3198006600,mptcp dss ack 641236127], length 0 11 IP 10.0.0.1.4321 > 10.0.0.2.43474: Flags [.], ack 15, win 227, options [nop,nop,TS val 3198006641 ecr 3701631928,mptcp dss ack 3158259848540329266], length 0 Copy snippet Indeed, a first three-way handshake takes place with packets 1-3 on the 10.0.0.* path, while a second handshake kicks in with packets 6-8 on the 192.168.0.* path (note the S flag for SYN and the S. for SYN/ACK).
Tired of manual subflows? mptcpd to the rescue
By now you are probably tired of having to manually specify endpoints and subflows for each new test you want to do. Luckily, this is not the only way to use MPTCP endpoints.
Linux kernel version 5.11 is able to send netlink notifications to user space when the kernel receives an add address netlink command. A small user-space application, mptpcd, can tap into these notifications to provide a path management mechanism to user space and control MPTCP behavior. mptpcd will be available with RHEL 9.
In the end, Hercules managed to find his way to Athens by simply asking one of the goddesses which path the other goddess would tell him to use. He chose the other path and quickly returned to Athens.
With MPTCP, you don’t need to solve path riddles anymore: You can ask for multiple roads and take them all together to head where you need to go. Recent Articles
Test-driven development with Quarkus Optimize Node.js images with the UBI 8 Node.js minimal image Improve UDP performance in RHEL 8.5 Knative 1.0: Simplify serverless on Kubernetes Boost throughput with RESTEasy Reactive in Quarkus 2.2 Related Content
Multipath TCP on Red Hat Enterprise Linux 8.3: From 0 to 1 subflows Automating the testing process for SystemTap, Part 1: Test automation with libvirt and Buildbot