Literally everyone who uses bazel uses it client-server as a persistent daemon.

sunshowers · 2025-04-10T17:17:38 1744305458

Demonization introduces a range of potential cache invalidation issues. The issues are solvable, but whose KPIs depend on getting to the bottom of them?

joshuamorton · 2025-04-10T21:04:20 1744319060

Do you have specific examples in the context of blaze/bazel here? I think "the set of cache invalidation issues" you're describing are basically "the set of cache invalidation issues blaze/bazel intends to solve", so the answer to "whose KPIs" is "the blaze team".

sunshowers · 2025-04-11T13:38:42 1744378722

I haven't used Bazel but I have used buck1 extensively, and the daemonized mode was quite buggy and often required the process to be killed. Quite frequently the process was just wedged, and even when it wasn't, memory leaks were quite common. Standard black box debuggers like strace and DTrace also become harder to use (e.g. you need to attribute a particular set of syscalls to a client, and a mindless "strace buck build ..." doesn't do what you want).

Daemonization is sometimes necessary, but it introduces lifecycle management problems that get in the way of robustness. Daemonization simply because your choice of programming language has bad startup times doesn't seem like a great idea to me.

I think on-disk cache invalidation and in-memory cache invalidation are distinctly different in practice.

joshuamorton · 2025-04-11T22:36:47 1744411007

I asked this because I've used bazel some, and blaze fairly extensively, and despite having done some deeply cursed things to blaze at times, I've never had issues with it as a daemon, to the point where I'd never consider needing to run it in non-daemonized mode as part of a debug process. It just works.

Second, "startup-time" is probably the least relevant complaint here in terms of why to daemonize. Running `help` without a daemon set up does take a few seconds (and yeah that's gross), but constructing the action graph cold for $arbitrary_expensive_thing takes a minute or two the first time I do it, and then around 1 second the next time.

Caching the build graph across builds is valuable, and persisting it in memory makes a lot more sense than on disk, in this case. The article we're discussing even argues in favor of making this even more extreme and moving action graph calculation entirely into a service on another machine because it prevents the situation where the daemon dies and you lose your (extremely valuable!) cached analysis graph:

> Bonanza performs analysis remotely. When traditional Bazel is configured to execute all actions remotely, the Bazel server process is essentially a driver that constructs and walks a graph of nodes. This in-memory graph is known as Skyframe and is used to represent and execute a Bazel build. Bonanza lifts the same graph theory from the Bazel server process, puts it into a remote cluster, and relies on a distributed persistent cache to store the graph’s nodes. The consequence of storing the graph in a distributed storage system is that, all of a sudden, all builds become incremental. There is no more “cold build” effect like the one you see with Bazel when you lose the analysis cache.

If you're worried about cache invalidation and correctness issues due to daemonization, I think you'd want to be even more concerned about moving them entirely off machine.

(I'm also not sure how their proposal intends to manage e.g. me and Phil on another machine trying to build conflicting changes that both significantly impact how the analysis graph is calculated and thrash each other, either you have to namespace so they don't thrash and then you've moved the daemon to the cloud, or you do some very very fancy partial graph invalidation approach, but that isn't discussed and it feels like it would be white paper worthy)

sunshowers · 2025-04-11T23:13:40 1744413220

Yeah, moving more things into the cloud makes debuggability even worse than today. I think that's a very large downside.

jeffbee · 2025-04-11T15:59:34 1744387174

> I haven't used Bazel

That was the perfect place to end the comment.

sunshowers · 2025-04-11T23:09:01 1744412941

Are you saying that daemonization doesn't make black box debugging harder?

lostdog · 2025-04-10T17:19:00 1744305540

And part of why it's a persistent daemon is because Java is slow to start up.

I'll note that bazelisk is written in Go.

jeffbee · 2025-04-10T19:12:33 1744312353

Exactly. And a Go program that forks a Java program that makes an RPC is still faster than `cargo help`