I think you misunderstand how mesh VPNs work. Their primary purpose is as a control plane - introducing peers to each other so they can either communicate directly or via a relay (eg DERP) via per-node encryption. They should have no overhead compared to a single point to point encrypted tunnel like wireguard, because the “mesh” features are not in the data path.
The only real difference here is how the vpn product implements wireguard: userspace or kernel space, and how well tuned that implementation is. It might make sense to compare wireguard implementations, but (afaik) all are using one of several open source ones. Tailscale did some work to improve performance that they blogged about here https://tailscale.com/blog/more-throughput
Neither Nebula nor ZeroTier is based on Wireguard.
What they compare in the article are systems that provide some form of ACL, which is why bare Wireguard is not included. That means there are features in the data path that could have significant performance implications versus a simple tunnel. The impact of using ACL features isn't really a focus of the presented benchmarks, but they do mention a separate test of using iptables to bolt on access controls.
It looks like the host kernel is not in full control – there is a EL2-level hypervisor, pKVM [1] that is actually the highest-privilege domain. This is pretty similar to the Xen architecture [1] where the dom0 linux os in charge of managing the machine is running as a guest of the hypervisor.
No, KVM is also a type 1 hypervisor but it doesn't attempt (with the exception of pKVM and of hardware protection features like SEV, neither of which is routinely used by cloud workloads) to protect the guest from a malicious host.
KVM is a type 2 hypervisor as the "Dom 0" kernel has full HW access. Other guests are obviously isolated as configured and are like special processes to userspace.
It gets a bit blurry on AArch64 without and with VHE (Virtual Host Extensions) as without VHE (< ARMv8.1) the kernel runs in EL1 ("kernel mode") most of the time and escalates to EL2 ("hypervisor mode") only when needed, but with VHE it runs at EL2 all the time. (ref. https://lwn.net/Articles/650524/)
No, "type 2" is defined by Goldberg's thesis as "The VMM runs on an extended host [53,75], under the host operating system", where:
* VMM is treated as synonymous with hypervisor
* "Extended host" is defined as "A pseudo-machine [99], also called an extended machine [53] or a user machine [75], is a composite machine produced through a combination of hardware and software, in which the machine's apparent architecture has been changed slightly to make the machine more convenient to use. Typically these architectural changes have taken the form of removing I/O channels and devices, and adding system calls to perform I/O and and other operations"
In other words, type 1 ("bare machine hypervisor") runs in supervisor mode and type 2 runs in user mode. QEMU running in dynamic binary translation mode is a type 2 VMM.
KVM runs on a bare machine, but it
delegates some services to a less privileged component such as QEMU or crosvm or Firecracker. This is not a type 2 hypervisor, it is a type 1 hypervisor that follows security principles such as privilege separation.
KVM is pretty much the only hypervisor that cloud vendors use these days.
So it's true that "most cloud workloads run on type 1 hypervisors" (KVM is one) but not that most cloud vendors/workloads run on microkernel-like hypervisors, with the exception of Azure.
Then can you explain how cloud workloads is the revenge of the microkernel, since there is exactly 1 major cloud provider that uses a microkernel-like hypervisor?
By not running monolithic kernels on top of bare metal, rather virtualized, or even better with nested virtualization, thus throwing out the door all the supposedly performance advantages in the usual monolithic vs microkernel flamewar discussions, regarding context switching.
Additionally to make it to the next level, they run endless amount of container workloads.
This is linked in the footer of this page, but Graham Christensen has an excellent blog post “Erase your darlings” [1] that explains why you might want to do this. There is a script floating around on github [2] that does the install automatically, but I needs to be modified slightly to work on newer nixos versions. I have a forked version [3] that I used this morning, but my decisions might not make sense for everyone, but it’s provided as-is for now :)
There’s an even better setup than Erase your Darlings. Instead of putting / on a zfs pool and erasing and rewriting it every reboot, just put tmpfs on / instead.
With / all in memory it automatically gets wiped and recreated every reboot, without needing to actively erase a disk, and thus with much less drive wear (depending on how frequently you reboot). It’s also faster on some things when loading from RAM instead of the disk. And it’s overall a cleaner, simpler setup.
You can also put tmpfs on home as well, using Impermanence and Home Manager to persist things like ~/.config and whatever other files or folders need to persist between reboots:
tmpfs on / is enabled by NixOS’s unique design, in which it keeps the entire system in /nix/store and then softlinks all the paths into their appropriate place in /. With tmpfs on /, NixOS automatically recreates those softlinks in tmpfs on reboot. Very little setup effort is required to make this work.
I was aware of the possibility of just using tmpfs, but I went with ZFS for / anyway for mostly one reason: I can erase root on every boot, but still keep snapshots of the last few boots. That means, that if I mess up some configuration and I fail to persist some important data, I can still go back and recover it if I need to.
I'm not sure that's necessary for / at least, though /home is a different matter.
Everything that needs to be persisted for / is done so either in configuration.nix (or equivalent flake), or using Impermanence. Once you've persisted things in one of those, it will remain so for all future derivations.
You can also create separate ZFS pools for things like LXC/LXD that need to persist between boots, but don't naturally go in / or /home.
So far I haven't run into a case where I forgot to add something to my impermanence config, so you may be right: it might not be necessary. However, just knowing that I could fix it if I forget something (for example when setting up a new service) somewhere down the line just gives me peace of mind.
I don’t think the point about disk wear is true. ZFS is a block-based CoW filesystem already, so the “erase” is actually something more like “make a new metadata entry that points to an earlier state snapshot”. Plus, I’d expect normal disk usage (ie. chrome disk cache) probably far outweighs any disk wear you’d get from the (relatively) small size of files you get written into /.
Unless you’re working on a server with a ton of ram, I also think using tmpfs is more likely to shoot yourself in the foot with excess memory pressure. I don’t know of a way for the kernel to free memory if you write a huge file to the tmpfs partition by mistake, unless you use swap, and then you have the problems that come with that.
Sounds interesting in theory, but modern development setups are often full with tools, dependencies, IDE indices, intermediate build results, etc. pp. that all may take a very long time to download and build from scratch. Sure, you could try to keep track of all such things and persist them but it's going to be a lot of effort trying to figure out where all your tools are dumping their state.
I don't think of impermanence as a tool for development setups, but rather a tool to improve production security. When a server gets compromised, it's common for an attacker to leverage their initial access to set up backdoor access for themselves, e.g. an additional privileged user or privileged service which phones home, so that they're no longer reliant on the original vulnerability to gain access again. This is important to ensure that they can launch a more damaging attack at a more opportune time (e.g. at the beginning of a long weekend). Now consider a stateful server which you need to host (e.g. Kubernetes control plane / etcd) where you ordinarily cannot practice immutable infrastructure due to the stateful nature of the server. Modules like impermanence allow you to guard against this kind of compromise by simply wiping out everything but the actual state as a result of rebooting. Any privileged users or malicious processes (which, of course, are not part of the system configuration used at boot) get wiped out at every reboot. It's not a silver bullet - an attacker could simply releverage the original vulnerability and set up access again - but doing the reboots frequently would force the vulnerability to be re-exploited each time, making it a pattern of access more likely to be detected in a SIEM.
> often full with tools, dependencies, IDE indices, intermediate build results
Tools and dependencies go in nix. Indexes and temporary build stuff is just cache, which you can still have (I'd lean towards regenerating per reboot, but YMMV).
> but it's going to be a lot of effort trying to figure out where all your tools are dumping their state.
Fair. Kind of an indictment of the current state of the ecosystem, but yes.
Sure, but then you're throwing away the purported benefits (setup is always reproducible) for your entire home folder.
It seems like there are two contradictory goals here, each with their own benefits. Impermanence gives you reproducibility, but permanence gives you performance.
Maybe the right tradeoff is to just persist the entire home folder and nothing else (or few other things... I'm not sure I'd want to always download some programs that are quite huge), but the tradeoff is essentially still there.
you're talking about three different things: NixOS, Home Manager, and Impermanence. or put another way: reproducible root filesystem, reproducible home folder, automatically reproducing things on boot.
if you want a reproducible root filesystem, use NixOS. if you want to reproduce your root filesystem on boot, use NixOS with impermanence.
if you want a reproducible home folder, use home manager. if you want to reproduce your home folder on boot, use home manager with impermanence.
I only use impermanence on servers. I don't think there's any point using it on a desktop, or with Home Manager, other than street cred. it just complicates things. I don't really even recommend Home Manager for first time NixOS users for the same reason.
edit: and I want to address this comment:
> I'm not sure I'd want to always download some programs that are quite huge
this isn't how it works; the binaries, as well as the entire initial state of the filesystem, are cached in the Nix cache, a read-only filesystem. you're not downloading everything on every boot.
With NixOS, you typically use flake.nix or shell.nix to "shell into" development environment so this is kinda non issue. You can also use `nix run` or `nix-env` to mix traditional package management and this.
> but it's going to be a lot of effort trying to figure out where all your tools are dumping their state.
This is true though, if you dont want to manage all of their configs from nix, I would make their dump locations, or $XDG_DIRS/$HOME (if you are lazy) permanent in this case.
Unfortunately this syntax is not generally supported yet - it's only supported with the buildkit backend and only landed in the 1.3 "labs" release. It was moved to stable in early 2022 (see https://github.com/moby/buildkit/issues/2574), so that seems to be better, but I think may still require a syntax directive to enable.
this has been a serious pain in my side for a while, both for my own debugging and for telling people I try to help "you're gonna have to start over and do X or this will take hours longer".
Yes, as far as I understand it breaks it, but combined with a package manager like Nix / Guix / Spack, this tool enables atomically upgrading dynamic library dependencies on a per-package basis rather than globally. If you want all applications to use a new version of a shared library, you'd need to re-"dynamic link" all of the reverse dependencies with this tool, but those package managers can handle that for automatically, and without needing access to the object files (unlike true static linking).
As I see it, that pattern is more of an alternative to container images, where this still gives you atomic dependency updates for an application without resorting to shipping N copies of each dynamic library.
There's an open PR with the details, I set it to deploy every push to that PR for now so we could make quick fixes. It just runs asciidoctor, nothing fancy.