I don't support vfork(2) right now, only fork(2). I have some super crazy ideas ...

kmavm · 2025-09-19T22:01:42 1758319302

Hi Fil! Congrats on all the amazing progress on Fil-C.

We needed to port all the user-level fork(2) calls to vfork(2) when working on uCLinux, a port of Linux to MMU-less microcontrollers[1]. It used to be that paging MMUs were kinda expensive (those TLBs! so much associativity!!!), and the CPU on your printer/ethernet card/etc. might not have that much grit. Nowadays not so much.

Still. A hard-and-fast use for vfork(2), as requested perhaps.

[1] http://www.ibiblio.org/lou/old/ViewStation/

pizlonator · 2025-09-19T22:09:49 1758319789

It'll be a while before Fil-C is appropriate for use in MMU-less microcontrollers.

cryptonector · 2025-09-19T20:56:02 1758315362

Remember that the child side of `vfork(2)` can only do async-signal-safe things as `vfork(2)` is documented, and it's really as if it is executing in the same thread as the parent (down to thread-locals, I do believe), so really, `vfork(2)` shouldn't really be all that special for Fil-C! You might want to try it.

As u/foota notes, it's important. The point of `vfork(2)` is that it's _fast_. https://news.ycombinator.com/item?id=30502392

pizlonator · 2025-09-19T20:59:31 1758315571

> Remember that the child side of `vfork(2)` can only do async-signal-safe things as `vfork(2)` is documented

And if it does anything that isn't async-signal-safe, then all bets are off.

So, the Fil-C implementation of vfork(2) would have to have some way of checking (either statically or dynamically) that nothing async-signal-unsafe happens. That's the hard bit. That's why I think that implementing it is crazy.

I'd have to have so many checks in so many weird places!

> The point of `vfork(2)` is that it's _fast_

Yup. That's the reason to have it. But I haven't yet found a situation where a program I want to run is slow because I don't have vfork(2). The closest thing is that bash is visibly slower in Fil-C due to forking, but bash always uses fork(2) anyway - so in that case what I need to do is make my fork(2) impl faster, not implement vfork(2).

cryptonector · 2025-09-19T21:05:15 1758315915

> And if it does anything that isn't async-signal-safe, then all bets are off.

> So, the Fil-C implementation of vfork(2) would have to have some way of checking (either statically or dynamically) that nothing async-signal-unsafe happens. That's the hard bit. That's why I think that implementing it is crazy.

Not really. See, Fil-C already makes those things that would not be safe be safe except returning from the caller of `vfork(2)`. Treat the child as if it's the same thread as in the parent and let it do whatever it would do.

Well, I guess the biggest issue is that you'd have to deal with how to communicate between the child and the GC thread, if you have to. One option is to just not allow the GC to run while a child of `vfork(2)` hasn't exec'ed-or-exit'ed.

> > The point of `vfork(2)` is that it's _fast_

> Yup. That's the reason to have it. But I haven't yet found a situation where a program I want to run is slow because I don't have vfork(2). The closest thing is that bash is visibly slower in Fil-C due to forking, but bash always uses fork(2) anyway - so in that case what I need to do is make my fork(2) impl faster, not implement vfork(2).

Typically it's programs with large RSS that need it, so things like JVMs, which probably wouldn't run under Fil-C.

pizlonator · 2025-09-19T21:16:26 1758316586

Simple example that breaks the world:

    void foo(void)
    {
        vfork();
    }

In this case, the vfork child is returning. Eventually it'll exit. In the meantime, it's clobbered the vfork parent's stack.

Kaboom

cryptonector · 2025-09-19T21:55:21 1758318921

Yes, _that_ is not to be allowed. I think you could have LLVM know about `vfork(2)` and treat that as an error -- heck, clang already knows how to do that, does it not?

pizlonator · 2025-09-19T22:08:48 1758319728

That's just one example and the problem isn't just checking this one case, but any generalization of it. It can't be a fully static check because you could achieve similar things with longjmp or exceptions (both of which Fil-C supports).

And then there are similar issues with heap accesses and safepoints. The vfork child has to particular in safepoints except that it cannot quite (you can't use pthread_mutex/pthread_cond primitives in the vfork child to synchronize with threads in the vfork parent).

Anyway, I'm thought about this a lot, and I could keep coming at you with reasons why it's hard. It's not like I haven't implemented vfork(2) out of some abstract fear.

cryptonector · 2025-09-20T03:02:35 1758337355

Another option is to make `posix_spawn(3)` in the program's C library work with Fil-C to let Fil-C implement it w/o instrumentation in its C library. This would work for me.

cryptonector · 2025-09-20T03:00:41 1758337241

Can we make the C library mutex lock and condvar primitives work on the vfork() child? It's practically like a thread...

pizlonator · 2025-09-20T03:37:35 1758339455

That might be the way to go but my best ideas for how to make vfork work sidestep all of that.

It's just a lot of work. It basically means that the whole Fil-C runtime has to be aware of this alternate mode where we're the vfork child and we have to play by different rules.

Anyway, long story short:

- Yeah I know vfork(2) is important. It's just not the most important thing on my plate. The constant time crypto situation we were talking about in the other thread is definitely more important, for example!

- I know how to do vfork(2) but every version of it that I've come up with is a ton of work and carries the risk of introducing really wacky stability bugs and safety bugs.

So, vfork is just a high risk, high cost, medium reward, medium urgency kind of thing. That probably means it'll be a while before I implement it.

cryptonector · 2025-09-20T04:59:02 1758344342

Understood, and I agree that constant-time assembly-coded crypto is much more important. Moreover, if you make `posix_spawn()` fast that's enough, and you can probably do that by proxying it from the victim program's C library to Fil-C's.

trws · 2025-09-19T22:36:12 1758321372

The other place it comes up is launchers and resource managers. We actually have a series of old issues and implementation work on flux (large scale resource manager for clusters) working around fork becoming a significant bottleneck in parallel launch. IIRC it showed up when we had ~1gb of memory in use and needed to spawn between 64 and 192 processes per node. That said, we actually didn’t pivot to vfork, we pivoted to posix_spawn for all but the case where we have to change working directory (had to support old glibc without the attr for that in spawn). If you’re interested I think we did some benchmarking with public results I could dredge up.

Anyway, much as I have cases where it matters I guess what I’m saying is I think you’re right that vfork is rarely actually necessary, especially since you’d probably have a much easier time getting a faster and still deterministic spawn if it ever actually becomes a bottleneck for something you care about.

cryptonector · 2025-09-20T05:00:23 1758344423

> That said, we actually didn’t pivot to vfork, we pivoted to posix_spawn for all but the case where we have to change working directory (had to support old glibc without the attr for that in spawn).

You can always accomplish that sort of thing by using a helper program that ultimately execs the desired one -- just prefix it and its arguments to the intended argv.

trws · 2025-09-20T19:49:02 1758397742

Quite so. We would have too, but I left out the nasty bit that someone had at one point put a callback argument in an internal launching API that runs between fork and exec. Still working on squashing the last of those.

cryptonector · 2025-09-20T03:01:38 1758337298

> The other place it comes up is launchers and resource managers.

Yes, and typically for the same reasons as in JVMs etc: `vfork()` is just faster.