I didn't forget. That's predictable. It happens when the application code does something, or stops doing something, as opposed to the moving collector just doing it at random times.
The C++ standard has never included a garbage collector. It only provided mechanisms intended to facilitate the implementation of a GC, but they were useless.
> I haven't seen a C++ programmer carefully opt into GC for a subset of their allocations even though there are GC libraries written for the language.
Can you give an example of such GC libraries?
> Whoever made that claim? Gamedevs particulary have been writing custom memory allocators since decades precisely because they know free() is not free and malloc() isn't fast.
Game developers use engines based on the GC.
> It's not an illusion, you literally use control over memory management.
The shared_ptr does not provide full control.
> What they don't want is random stalls in odd frames.
You can have fully concurrent GC, without any stalls.
I don’t think anyone actually believes that the language change is what made the difference in that post. As far as anyone can tell there were 2 things that made their rewrite better:
1. They had experience on the type of app and you tend to do things better when you do it the second time.
2. Reading between the lines they were likely running an ancient version of Java from 10+ years ago.
> I don’t think anyone actually believes that the language change is what made the difference in that post
They already admit in the article that JVM setup costs were a huge overhead during scaling operations. JIT might also incur a similar overhead at the same stage. So, not the language per se, but the runtime difference must have made a difference.
Compiled native code may not be too different between the two. But, in terms of memory management, Swift's ability to keep the heap in a smaller area might contribute to better cache utilization which might implicitly contribute to performance.
In the end, as you said, writing better code the second time might have made the greatest contribution of course. That doesn't necessarily mean that language change has made no difference.
> They already admit in the article that JVM setup costs were a huge overhead during scaling operations. JIT might also incur a similar overhead at the same stage.
Yeah the startup costs are an active problem. I do think though that if they weren’t running an old runtime they wouldn’t have had the same GC issues.
No wonder Swift performs unwell in this benchmark. The Swift implementation is written very poorly, done by developers obviously unexperienced in writing proper Swift :)
Azul C4 is not a pauseless GC. In the documentation it says "C4 uses a 4-stage concurrent execution mechanism that eliminates almost all stop-the-world pauses."
> C4 differentiates itself from other generational garbage collectors by supporting simultaneous-generational con-
currency: the different generations are collected using concurrent (non stop-the-world) mechanisms
C++ isn't hostile toward garbage collection — it's more the programmers using C++ who are . C++ is the only language that can have an optional, totally pause-less, concurrent GC engine (SGCL). No other programming language, not even Java, offers such a collector.
Lots of pauseless concurrent GCs have shipped for other languages. SGCL is not special in that regard. Worse, SGCL hasn’t been shown to actually avoid disruptions to program execution while the shipping concurrent GCs for Java and other languages have been shown to really avoid disruptions.
(I say disruptions, not pauses, because avoiding “pauses” where the GC “stops” your threads is only the first step. Once you tackle that you have to fix cases of the GC forcing the program to take bursts of slow paths on pointer access and allocation.)
SGCL is a toy by comparison to other concurrent GCs. For example is has hilariously complex pointer access costs that serious concurrent GCs avoid.
There isn’t a single truly pause-less GC for Java — and I’ve already proven that to you before. If such a GC exists for any other language, name it.
And no, SGCL doesn’t introduce slow paths, because mutators never have to synchronize with the GC. Pointer access is completely normal — unlike in other languages that rely on mechanisms like read barriers.
> There isn’t a single truly pause-less GC for Java — and I’ve already proven that to you before. If such a GC exists for any other language, name it.
You haven't proven that. If you define "pause" as "the world stops", then no, state of the art concurrent GCs for Java don't have that. If you define "pause" as "some thread might sometimes take a slow path due to memory management" then SGCL has those, as do most memory management implementations (including and especially malloc/free).
> And no, SGCL doesn’t introduce slow paths, because mutators never have to synchronize with the GC. Pointer access is completely normal — unlike in other languages that rely on mechanisms like read barriers.
The best concurrent GCs have no read barriers, only extremely cheap write barriers.
You have allocation slow paths, at the very least.
First, there are no Java GCs that completely eliminate stop-the-world pauses. ZGC and Shenandoah reduce them to very short, sub-millisecond windows — but they still exist. Even the most concurrent collectors require STW phases for things like root scanning, final marking, or safepoint synchronization. This is documented in OpenJDK sources, benchmarks, and even in Oracle’s own whitepapers. Claiming Java has truly pause-less GC is simply false.
Second, you’re suggesting there are moving GCs that don’t use read barriers and don’t stop mutator threads at all. That’s technically implausible. Moving collectors by definition relocate objects, and unless you stop the world or have some read barrier/hazard indirection, you can’t guarantee pointer correctness during concurrent access. You must synchronize with the mutator somehow — either via stop-the-world, read barriers, or epoch/hazard-based coordination. It’s not magic, it’s basic memory consistency.
SGCL works without moving anything. That’s why it doesn’t need synchronization, read barriers, or even slow-path allocation stalls. That’s not a limitation — that’s a design goal. You can dislike the model, but let’s keep the facts straight.
It is hostile in the sense that it allows hiding and masking pointers, so it is hard to have an exact moving GC.
SGCL, as impressive as it is, AFAIK requires pointers to be annotated, which is problematic for memory safety, and I'm not sure that it is a moving GC.
SGCL introduces the `tracked_ptr` smart pointer, which is used similarly to `shared_ptr`. The collector doesn't move data, which makes it highly efficient and — perhaps surprisingly — more cache-friendly than moving GCs.
Folks who make claims about the cache friendliness of copying GCs have millions of lines of credible test code that they’ve used to demonstrate that claim.
Compaction doesn't necessarily guarantee cache friendliness. While it does ensure contiguity, object layout can still be arbitrary. True cache performance often depends on the locality of similar objects — for example, memory pools are known for their cache efficiency. It's worth noting that Go deliberately avoids compaction, which suggests there's a trade-off at play.
As I mentioned earlier, take a look at the Golang. It's newer than Java, yet it uses a non-moving GC. Are you assuming its creators are intentionally making slower this language?
Not at all. Most objects die young and thus are never moved. Also, the time before it is moved is very long compared to CPU operations so it is only statistically relevant (very good throughput, rare, longer tail on latency graphs).
Also, write-only barriers don't have that big of an overhead.
It doesn't matter if objects die young — the other objects on the heap are still moved around periodically, which reduces performance. When you're using a moving GC, you also have additional read barriers that non-moving GCs don't require.
Is that period really that big of a concern when your threads in any language might be context switched away by the OS? It's not a common occurrence on a CPU-timeline at all.
Also, it's no accident that every high-performance GC runtime went the moving, generational way.
That time may seem negligible, since the OS can context switch threads anyway, but it’s still additional time during which your code isn’t doing its actual work.
Generations are used almost exclusively in moving GCs — precisely to reduce the negative performance impact of data relocation. Non-moving GCs are less invasive, which is why they don’t need generations and can be fully concurrent.
I would rather say that generations are a further improvement upon a moving collector, improving space usage and decreasing the length of the "mark" phase.
And which GC is fully concurrent? I don't think that's possible (though I will preface that I am no expert, only read into the topic on a hobby level) - I believe the most concurrent GC out there is ZGC, which does read barriers and some pointer tricks to make the stop-the-world time independent of the heap size.
Java currently has no fully concurrent GC, and due to the volume of garbage it manages and the fact that it moves objects, a truly fully concurrent GC for this language is unlikely to ever exist.
Non-moving GCs, however, can be fully concurrent — as demonstrated by the SGCL project for C++.
In my opinion, the GC for Go is the most likely to become fully concurrent in the future.
In that case, are you doing atomic writes for managed pointers/the read flag on them? I have read a few of your comments on reddit and your flags seem to be per memory page? Still, the synchronization on them may or may not have a more serious performance impact than alternative methods and without a good way to compare it to something like Java which is the state of the art in GC research we can't really comment much on whether it's a net benefit.
Also, have you perhaps tried modeling your design in something like TLA+?
You can't write concurrent code without atomic operations — you need them to ensure memory consistency, and concurrent GCs for Java also rely on them. However, atomic loads and stores are cheap, especially on x86. What’s expensive are atomic counters and CAS operations — and SGCL uses those only occasionally.
Java’s GCs do use state-of-the-art technology, but it's technology specifically optimized for moving collectors. SGCL is optimized for non-moving GC, and some operations can be implemented in ways that are simply not applicable to Java’s approach.
I’ve never tried modeling SGCL's algorithms in TLA+.
We call them low-latency GCs, but I was responding to a comment that called the same thing "pausless". However, even programs without any GC pauses can have pauses of a similar duration due to OS activity, so programs on regular OS aren't generally pauseless anyway and the term is mostly meaningless in most contexts. As a marketing term (which we don't use) it merely means the application won't experience GC-related pauses that are longer than OS-related pauses.
This is some weird way of counting. A system pause plus a GC pause is two pauses. Just because one pause can't be avoided doesn't mean you can introduce more pauses.
Just to be clear, the pause in ZGC is done as an optimisation of thread synchronisation (there's no GC work done in the pause). It's just as easy to do things without anything you would consider a pause, only it's less efficient. Momentarily stopping all program threads is just an efficient way to deliver a message to all threads considering that the OS might have some threads descheduled.
At the end of the day, we care about the latency for some particular thread(s), and truly minimising that at this level requires a realtime kernel (that comes with significant performance disadvantages).
Unless you are on a realtime OS, you may see longer pauses from the OS due to scheduling other processes, so I think it's fair to consider the new low-latency collectors (Shenandoah, ZGC) pauseless from a practical point of view.
I think as a technical term, pausless (as used here) has a specific meaning, just like real-time has a specific technical meaning (not literally 0 latency since that's generally physically impossible).
The reference counting also causes pauses when freeing large object graphs. Fully concurrent GC does not cause any pauses and it's more efficient. It also has less memory overhead per object.
I know of some who claim to be "pauseless", but it's b.s. marketing. They also tend to be memory hogs that don't play well with other processes in low memory environments with the kind of allocation behavior seen in UI applications.
And of course, the issue is not only stop-the-world pauses (though those are especially bad), but any pause in any thread period. A RC system lets you have consistent, predictable performance and often can be completely eliminated through static analysis.
RAII can also cause pauses when freeing large object graphs. It's not unique to RC. However in practice it's usually much less of an issue than GC pauses. It's only been an issue for me once in my entire career.