Throughput issue of WAN->LAN TCP routing (with NAT) on Turris MOX
I am experiencing a strange throughput (performance) issue on my Turris MOX when downloading via TCP (e.g. HTTP, HTTPS) from the Internet. The issue affects many but not all hosts. Here is the network setup where I can deterministically reproduce the issue:
[Host A] <---> (Internet) <---> WAN [Turris MOX] LAN <---> [Host B]
All traffic is IPv4, the MTU is 1500 bytes everywhere, I don't observe any IP packet fragmentation. The connectivity of Host A is 200 Mbps symmetric (WEDOS). The WAN connectivity of the Turris MOX is 300 Mbps symmetric (viaGIA). The Gbit LAN port of the Turris MOX is connected directly to Host B via a short patch cable.
In isolation, the throughput of all the links has been verified both locally and by connecting to a 3rd Host C on the Internet (1 Gbps connectivity, Casablanca) and it is trivial to achieve the nominal speed of download/upload. The issue is not observed when Host C replaces Host A in the setup (but again, the problem is not unique to Host A).
Host A runs the following command:
nc -4 -k -l -p 4000 < /dev/zero
Host B runs the following command:
nc -4 89.221.222.131 4000 > /dev/null
In other words, Host B connects via TCP (and through NAT routing on Turris MOX) to Host A and Host A sends a stream of zero bytes to Host B via this TCP connection. The observed download speed fluctuates between 4 Mbps to 8 Mbps instead of being close to the ideal 200 Mbps.
Further observations:
-
Running the receiving command (i.e.
nc 89.221.222.131 4000 > /dev/null
) directly on the Turris MOX yields successfully the ideal download speed of 200 Mbps. -
Running a "relay" on the Turris MOX (i.e.
nc 89.221.222.131 4000 | nc -l -p 4000
) and connecting to this relay directly on Host B (i.e.nc -4 192.168.168.1 4000 > /dev/null
) also yields a solid download speed of 120 Mbps. This demonstrates that the individual links of the Turris MOX are not saturated. -
Even more strangely and surprisingly, when running the original benchmark between Host A and Host B, it is possible to improve the speed of the A -> B transfer close to 200 Mbps just by running another TCP transfer between the Turris MOX and Host B at the same time (i.e.
nc -l -p 4000 < /dev/zero
on the Turris MOX andnc -4 192.168.168.1 4000 > /dev/null
on Host B)! This "disturbing" TCP connection not only improves the performance of the original TCP connection, but the download speeds appear to be strangely correlated, again close to 200 Mbps. This suggests that the issue might have something to do with interrupt processing on the Turris MOX or something.
Turris MOX: MOX Start (1024 MB, WAN port) + MOX C (LAN ports)
Turris OS: 5.1.10 with kernel 4.14.222 (the same behavior is observed on 5.2.0 with kernel 4.14.229), no non-standard configuration
If you need more information of any kind, I would be happy to provide them.