Thanks for the feeback. I was hoping that the other words on the slide (LACP and bonding) would give enough context.
I'm afraid that my presentation didn't really have room to dive much into LACP. I briefly said something like the following when giving that slide:
Basically, the LACP link partner (router) hashes traffic consistently across the multiple links in the LACP bundle, using a hash of its choosing (typically an N-tuple, involving IP address and TCP port). Once it has selected a link for that connection, that connection will always land on that link on ingress (unless the LACP bundle changes in terms of links coming and going). We're free to choose whatever egress NIC we want (it does not need to be the same NIC the connection entered on). The issue is that there is no way for us to tell the router to move the TCP connection from one NIC to another (well, there is in theory, but our routers can't do it)
> We're free to choose whatever egress NIC we want
Wait, I got lost again... You say you can "output on any egress NIC". So all four egress NICs have access to the TLS encryption keys and are cooperating through the FreeBSD kernel to get this information?
Is there some kind of load-balancing you're doing on the machine? Trying to see which NIC has the least amount of traffic and routing to the least utilized NIC?
LACP (which is a control protocol) is a standard feature supported forever on switches. To put it simply both the server and switch see a single port (fake) that has physical ethernet members and a hashing algorithm puts the traffic on that physical ethernet link based on the selected hash. The underlying sw/hw picks which physical link to put the packet on. The input to the hash in picking the link can be src/dst (for all items listed here) MAC, port, IP addr. LACP handles the negotiation of what that hash is between the ends and also hands a link failure "hey man, something broke, we have one less link now). For any given single flow it will hash to the same link. So a for example in a 4x10G lag (also called a port-channel in networking speak) max bandwidth for a single flow would be 10g the max of a single member. In an ideal world the hashing would be perfectly balanced however it is possible to have a set of flows all hash to the same link. Hope that helps.
That's an excellent overview. I think I got the gist now.
There's all sorts of subtle details though that's probably just "implementation details" of this system. How and where do worker threads spawn up? Clearly sendfile / kTLS have great synergies, etc. etc.
Its a lot of detail, and an impressive result for sure. I probably don't have the time to study this on my own so of course, I've got lots and lots of questions. This discussion has been very helpful.
Adding the NUMA things "on top" of sendfile/kTLS is clearly another issue. The hashing of TCP/Port information into particular links is absolutely important, because of the "Physical location" of the ports matter.
I think I have the gist at this point. But that's a lot of moving parts here. And the whole NUMA-fabric being the bottleneck just ups the complexity of this "simple" TLS stream going on...
EDIT: I guess some other bottleneck exists for Intel/Ampere's chips? There's no NUMA in those. Very curious.
----
Rereading "Disk centric siloing" slides later on actually answer a lot of my questions. I think my mental model was disk-centric siloing and I just didn't realize it. Those slides work exactly how I think this "should" have worked, but it seems like that strategy was shown to be inferior to the strategy talked about in the bulk of this presentation.
Hmmm, so my last "criticism" to this excellent presentation. Maybe an early slide that lays out the strategies you tried? (Disk Siloing. Network-siloing. Software kTLS, and Hardware kTLS offload)?
Just one slide at the beginning saying "I tried many architectures" would remind the audience that many seemingly good solutions exist. Something I personally forgot in this discussion thread.
The whole paper is discussing bottlenecks, path optimisation between resources, and impacts of those on overall throughout. It's not a simple loadbalancing question being answered
If we are using LACP between router and server, it means we create single logical link between them. We can use as many as physical link supported by both server and router.
The server and router will treat them as a single link. Thus the ingress and egress of the packet doesn’t really matter.
although we’re free to choose egress port in LACP it’s still wise to maintain some kind of client or flow affinity, to avoid inadvertent packet reordering.
I'm afraid that my presentation didn't really have room to dive much into LACP. I briefly said something like the following when giving that slide:
Basically, the LACP link partner (router) hashes traffic consistently across the multiple links in the LACP bundle, using a hash of its choosing (typically an N-tuple, involving IP address and TCP port). Once it has selected a link for that connection, that connection will always land on that link on ingress (unless the LACP bundle changes in terms of links coming and going). We're free to choose whatever egress NIC we want (it does not need to be the same NIC the connection entered on). The issue is that there is no way for us to tell the router to move the TCP connection from one NIC to another (well, there is in theory, but our routers can't do it)
I hope that helps