Does FreeBSD `sendfile` avoid a context switch from userspace to kernelspace as well, or is it only zero-copy? I've worked with 100Gbps NICs and had to end up using both a userspace network stack and a userspace storage driver on Linux to avoid the context switch and ensure zero-copy.
Also, have you looked into offloading more of the processing to an FPGA card instead?
There is no context switch, like most system calls, sendfile runs in the thread context of the thread making the syscall.
FreeBSD has "async sendfile", which means that it does not block waiting for the data to be read from disk. Rather, the pages that have been allocated to hold the data are staged in the socket buffer and attached to mbufs marked "not ready". When the data arrives, the disk interrupt thread makes a callback which marks the mbufs "ready", and pokes the TCP stack to tell them they are ready to send.
This avoids the need to have many threads parked, waiting on disk io to complete.
Also, have you looked into offloading more of the processing to an FPGA card instead?