Hacker Newsnew | past | comments | ask | show | jobs | submit | pebal's commentslogin

RC is a GC method and the least efficient one.


It's the most predictable and has much less overhead than a moving collector.


Only when we forget about the impact of cycle collections, or domino effects stoping the world when there is a cascade of counters reaching zero.

The optimisatios needed to improve such scenarions, are akin to a poor man's tracing GC implementation.


I didn't forget. That's predictable. It happens when the application code does something, or stops doing something, as opposed to the moving collector just doing it at random times.


The C++ standard has never included a garbage collector. It only provided mechanisms intended to facilitate the implementation of a GC, but they were useless.


This isn't fully concurrent GC. It pauses mutators threads and delegates them to perform some of the work for the GC.


> I haven't seen a C++ programmer carefully opt into GC for a subset of their allocations even though there are GC libraries written for the language.

Can you give an example of such GC libraries?

> Whoever made that claim? Gamedevs particulary have been writing custom memory allocators since decades precisely because they know free() is not free and malloc() isn't fast.

Game developers use engines based on the GC.

> It's not an illusion, you literally use control over memory management.

The shared_ptr does not provide full control.

> What they don't want is random stalls in odd frames.

You can have fully concurrent GC, without any stalls.


But that's why Swift generates slower code. Memory is cheap, also for Apple, although Apple would like to hide that fact.


Have you seen the benchmarks? Swift code is 40% faster than Java in that instance.


I don’t think anyone actually believes that the language change is what made the difference in that post. As far as anyone can tell there were 2 things that made their rewrite better:

1. They had experience on the type of app and you tend to do things better when you do it the second time.

2. Reading between the lines they were likely running an ancient version of Java from 10+ years ago.


> I don’t think anyone actually believes that the language change is what made the difference in that post

They already admit in the article that JVM setup costs were a huge overhead during scaling operations. JIT might also incur a similar overhead at the same stage. So, not the language per se, but the runtime difference must have made a difference.

Compiled native code may not be too different between the two. But, in terms of memory management, Swift's ability to keep the heap in a smaller area might contribute to better cache utilization which might implicitly contribute to performance.

In the end, as you said, writing better code the second time might have made the greatest contribution of course. That doesn't necessarily mean that language change has made no difference.


> They already admit in the article that JVM setup costs were a huge overhead during scaling operations. JIT might also incur a similar overhead at the same stage.

Yeah the startup costs are an active problem. I do think though that if they weren’t running an old runtime they wouldn’t have had the same GC issues.



No wonder Swift performs unwell in this benchmark. The Swift implementation is written very poorly, done by developers obviously unexperienced in writing proper Swift :)


There are none, at least not production grade.


There are none, or you're not aware of their existence?


There are no production implementations of GC algorithms that don't stop the world at all. I know this because I have some expertise in GC algorithms.


I'm curious why you don't consider C4 to be production grade


Azul C4 is not a pauseless GC. In the documentation it says "C4 uses a 4-stage concurrent execution mechanism that eliminates almost all stop-the-world pauses."


Except the documentation says

> C4 differentiates itself from other generational garbage collectors by supporting simultaneous-generational con- currency: the different generations are collected using concurrent (non stop-the-world) mechanisms


It doesn't matter at all. C4 uses STW.


I wish I could have learned more from this interaction than it doesn't matter.


I have heard (and from when I investigated) erlangs GC is 'dont stop the world'.

Maybe my definition is bad though.


C++ isn't hostile toward garbage collection — it's more the programmers using C++ who are . C++ is the only language that can have an optional, totally pause-less, concurrent GC engine (SGCL). No other programming language, not even Java, offers such a collector.


This is false.

Lots of pauseless concurrent GCs have shipped for other languages. SGCL is not special in that regard. Worse, SGCL hasn’t been shown to actually avoid disruptions to program execution while the shipping concurrent GCs for Java and other languages have been shown to really avoid disruptions.

(I say disruptions, not pauses, because avoiding “pauses” where the GC “stops” your threads is only the first step. Once you tackle that you have to fix cases of the GC forcing the program to take bursts of slow paths on pointer access and allocation.)

SGCL is a toy by comparison to other concurrent GCs. For example is has hilariously complex pointer access costs that serious concurrent GCs avoid.


There isn’t a single truly pause-less GC for Java — and I’ve already proven that to you before. If such a GC exists for any other language, name it.

And no, SGCL doesn’t introduce slow paths, because mutators never have to synchronize with the GC. Pointer access is completely normal — unlike in other languages that rely on mechanisms like read barriers.

Poluzuj gumkę, serio.


> There isn’t a single truly pause-less GC for Java — and I’ve already proven that to you before. If such a GC exists for any other language, name it.

You haven't proven that. If you define "pause" as "the world stops", then no, state of the art concurrent GCs for Java don't have that. If you define "pause" as "some thread might sometimes take a slow path due to memory management" then SGCL has those, as do most memory management implementations (including and especially malloc/free).

> And no, SGCL doesn’t introduce slow paths, because mutators never have to synchronize with the GC. Pointer access is completely normal — unlike in other languages that rely on mechanisms like read barriers.

The best concurrent GCs have no read barriers, only extremely cheap write barriers.

You have allocation slow paths, at the very least.


First, there are no Java GCs that completely eliminate stop-the-world pauses. ZGC and Shenandoah reduce them to very short, sub-millisecond windows — but they still exist. Even the most concurrent collectors require STW phases for things like root scanning, final marking, or safepoint synchronization. This is documented in OpenJDK sources, benchmarks, and even in Oracle’s own whitepapers. Claiming Java has truly pause-less GC is simply false.

Second, you’re suggesting there are moving GCs that don’t use read barriers and don’t stop mutator threads at all. That’s technically implausible. Moving collectors by definition relocate objects, and unless you stop the world or have some read barrier/hazard indirection, you can’t guarantee pointer correctness during concurrent access. You must synchronize with the mutator somehow — either via stop-the-world, read barriers, or epoch/hazard-based coordination. It’s not magic, it’s basic memory consistency.

SGCL works without moving anything. That’s why it doesn’t need synchronization, read barriers, or even slow-path allocation stalls. That’s not a limitation — that’s a design goal. You can dislike the model, but let’s keep the facts straight.


It is hostile in the sense that it allows hiding and masking pointers, so it is hard to have an exact moving GC.

SGCL, as impressive as it is, AFAIK requires pointers to be annotated, which is problematic for memory safety, and I'm not sure that it is a moving GC.


SGCL introduces the `tracked_ptr` smart pointer, which is used similarly to `shared_ptr`. The collector doesn't move data, which makes it highly efficient and — perhaps surprisingly — more cache-friendly than moving GCs.


Based on what data?

Folks who make claims about the cache friendliness of copying GCs have millions of lines of credible test code that they’ve used to demonstrate that claim.


Compaction doesn't necessarily guarantee cache friendliness. While it does ensure contiguity, object layout can still be arbitrary. True cache performance often depends on the locality of similar objects — for example, memory pools are known for their cache efficiency. It's worth noting that Go deliberately avoids compaction, which suggests there's a trade-off at play.


I'm not saying that compaction guarantees cache friendliness.

I'm saying you have no evidence to suggest that not compacting is better for cache friendliness. You haven't presented such evidence.


As I mentioned earlier, take a look at the Golang. It's newer than Java, yet it uses a non-moving GC. Are you assuming its creators are intentionally making slower this language?


If you have a moving, generational GC, then all the benefits of fast allocation are lost due to data moving and costly memory barriers.


Not at all. Most objects die young and thus are never moved. Also, the time before it is moved is very long compared to CPU operations so it is only statistically relevant (very good throughput, rare, longer tail on latency graphs).

Also, write-only barriers don't have that big of an overhead.


It doesn't matter if objects die young — the other objects on the heap are still moved around periodically, which reduces performance. When you're using a moving GC, you also have additional read barriers that non-moving GCs don't require.


Is that period really that big of a concern when your threads in any language might be context switched away by the OS? It's not a common occurrence on a CPU-timeline at all.

Also, it's no accident that every high-performance GC runtime went the moving, generational way.


That time may seem negligible, since the OS can context switch threads anyway, but it’s still additional time during which your code isn’t doing its actual work.

Generations are used almost exclusively in moving GCs — precisely to reduce the negative performance impact of data relocation. Non-moving GCs are less invasive, which is why they don’t need generations and can be fully concurrent.


I would rather say that generations are a further improvement upon a moving collector, improving space usage and decreasing the length of the "mark" phase.

And which GC is fully concurrent? I don't think that's possible (though I will preface that I am no expert, only read into the topic on a hobby level) - I believe the most concurrent GC out there is ZGC, which does read barriers and some pointer tricks to make the stop-the-world time independent of the heap size.


Java currently has no fully concurrent GC, and due to the volume of garbage it manages and the fact that it moves objects, a truly fully concurrent GC for this language is unlikely to ever exist.

Non-moving GCs, however, can be fully concurrent — as demonstrated by the SGCL project for C++.

In my opinion, the GC for Go is the most likely to become fully concurrent in the future.


Is SGCL your project?

In that case, are you doing atomic writes for managed pointers/the read flag on them? I have read a few of your comments on reddit and your flags seem to be per memory page? Still, the synchronization on them may or may not have a more serious performance impact than alternative methods and without a good way to compare it to something like Java which is the state of the art in GC research we can't really comment much on whether it's a net benefit.

Also, have you perhaps tried modeling your design in something like TLA+?


Yes, SGCL is my project.

You can't write concurrent code without atomic operations — you need them to ensure memory consistency, and concurrent GCs for Java also rely on them. However, atomic loads and stores are cheap, especially on x86. What’s expensive are atomic counters and CAS operations — and SGCL uses those only occasionally.

Java’s GCs do use state-of-the-art technology, but it's technology specifically optimized for moving collectors. SGCL is optimized for non-moving GC, and some operations can be implemented in ways that are simply not applicable to Java’s approach.

I’ve never tried modeling SGCL's algorithms in TLA+.


It’s in uncharitable to say the benefits are lost - I’d reframe it as creating tradeoffs.


Please don't write pauseless if there are short pauses. Pauseless in the Java GC context is a marketing scam.


We call them low-latency GCs, but I was responding to a comment that called the same thing "pausless". However, even programs without any GC pauses can have pauses of a similar duration due to OS activity, so programs on regular OS aren't generally pauseless anyway and the term is mostly meaningless in most contexts. As a marketing term (which we don't use) it merely means the application won't experience GC-related pauses that are longer than OS-related pauses.


This is some weird way of counting. A system pause plus a GC pause is two pauses. Just because one pause can't be avoided doesn't mean you can introduce more pauses.


Just to be clear, the pause in ZGC is done as an optimisation of thread synchronisation (there's no GC work done in the pause). It's just as easy to do things without anything you would consider a pause, only it's less efficient. Momentarily stopping all program threads is just an efficient way to deliver a message to all threads considering that the OS might have some threads descheduled.

At the end of the day, we care about the latency for some particular thread(s), and truly minimising that at this level requires a realtime kernel (that comes with significant performance disadvantages).


> I was responding to a comment that called the same thing "pausless"

In my defense, I called it "~pauseless" (as in, roughly pauseless).


Pebal is really pedantic about the term in every thread. Because apparently only he has ever achieved pause free GC for his C++ gc.


Unless you are on a realtime OS, you may see longer pauses from the OS due to scheduling other processes, so I think it's fair to consider the new low-latency collectors (Shenandoah, ZGC) pauseless from a practical point of view.


You can't call GC pauseless if it introduces pauses. We don't say something is free if you have to pay little for it. We say it's cheap.


I think as a technical term, pausless (as used here) has a specific meaning, just like real-time has a specific technical meaning (not literally 0 latency since that's generally physically impossible).


The reference counting also causes pauses when freeing large object graphs. Fully concurrent GC does not cause any pauses and it's more efficient. It also has less memory overhead per object.


What GC doesn't cause any pauses?

I know of some who claim to be "pauseless", but it's b.s. marketing. They also tend to be memory hogs that don't play well with other processes in low memory environments with the kind of allocation behavior seen in UI applications.

And of course, the issue is not only stop-the-world pauses (though those are especially bad), but any pause in any thread period. A RC system lets you have consistent, predictable performance and often can be completely eliminated through static analysis.


Here you have a GC engine that is completely pauseless: https://github.com/pebal/sgcl

It is available as a C++ library, so you can easily compare its performance to the reference counting.


That's why I said random pauses.

RAII can also cause pauses when freeing large object graphs. It's not unique to RC. However in practice it's usually much less of an issue than GC pauses. It's only been an issue for me once in my entire career.


In C++ one can combine RAII with allocators in order to avoid calling alloc/frees in latency critical paths.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: