I think you may be ignoring the aspect of cloning the codebase and handling writ...

cperciva · on Sept 2, 2022

Cloning the codebase is what I'm getting at with preparing a disk image.

CompuIves · on Sept 2, 2022

I'm very eager to see more developments in the fresh start times!

The main reason why snapshotting became interesting for us, is because we're running development servers defined by our users. A development server could take a long time to start, sometimes minutes.

So even if we can start the VM fast, the most important speedup for us is on the user code that we cannot control.

visarga · on Sept 2, 2022

Say the user code initiates a download, what happens if we clone during the run of the operation? Will the clone be able to finish the download?

The opposite case - say the user code binds to an IP:port to run a service. Will the clone try to step over the parent, binding to a port that is already taken?

CompuIves · on Sept 2, 2022

The TCP connection gets "paused", it doesn't get broken but packets don't arrive. The packets that don't arrive are seen as packet loss, and so they get resent. If the connection stays frozen too long it will lead to disconnection (at least of the websocket connection to the VM).

For IP uniqueness, we give every VM the same IP, but we put every VM in its own network namespace. Then we have iptable rules to rewrite the src/dest IP on every packet that enters the network namespace.

iam-TJ · on Sept 2, 2022

Have you considered, or tested, using ECMP (Equal Cost Multiple Path routing) and anycast for that?

I did some extensive IPv4 and IPv6 ECMP anycast testing a couple years ago where we'd randomly bring up and kill hosts and containers.

The network layer provided the fault tolerance and could be tweaked to react very quickly to missing hosts.

CompuIves · on Sept 2, 2022

That is very interesting, would it also be able to handle paused VMs where it buffers the packets up to certain threshold?

iam-TJ · on Sept 4, 2022

You know I'm not sure... TCP is stream oriented and supposed to handle lost packets so I'd think the TCP layer itself would handle the pause. If the sender doesn't get an ACK for a packet then it'll resend that packet later (TCP has sequence numbers so the stream can be reconstructed from out-of-order delivery and resends).

I revisited my proof-of-concept test scripts when I wrote the previous comment. I'll try in the next week to add some additional tests in there to determine stream reliability and packet delay/loss.

UDP of course doesn't have the same benefits.

I'm using ECMP + Anycast in a project I've been developing for the last couple of years (K18S or Keep It Simples Stupids) to effectively replace Kubernetes functionality with standard protocols and tooling that is in almost all distros.

We started out with the challenge of replacing the major parts of CNIs and that is where the ECMP + Anycast work arose from.

Native IPv6 with only VLANs and direct routing (no messing about with IPv4, NAT or overlay networks), ECMP + Anycast gives load-balanced routing to pods with automatic detection of lost hosts. Pods exposed to public get public IPv6 address in addition to a ULA (Unique Local Address, formerly called site-local). ULAs used for private routing.

Systemd-networkd is configured automatically by systemd-nspawn so there doesn't need to be a massive, foreign, orchestration control system.

Systemd-nspawn/systemd-machined to manage container lifecycles with OCI compliant images, or leverage nspawn's support for overlayfs to build machine images from several different file-system images. (rather like Docker's layers but always separate, not combined) but can be used in a pick-and-mix fashion to assemble a container that has several related but separately packaged components.

Configs for /etc/ of each container mapped in from external storage using the same overlayfs method. In most cases everything is read-only but some hosts/pods can be allowed to write into the /etc/ overlay and those changes can be optionally committed to the external storage.

Adopting IPV6 and dropping IPv4 was the best thing we ever did in terms of keeping things simple and straightforward and relying on the existing network protocols and layers, instead of re-inventing it all (badly).

At the time we started Kubernetes didn't even have IPv6 support and even once it did many CNIs couldn't handle it properly.