Can you explain why you're excited about with virtual threads? I get that they i...

atomicnumber3 · on Sept 19, 2023

The main problem is that it's not a matter of "speed" but just of congestion.

If you write a program using blocking IO and Platform (OS) threads, you're essentially limited to a couple hundred concurrent tasks, or however many threads your particular linux kernel + hardware setup can context switch between before latency starts suffering. So it's slow not because Java is slow, but because kernel threads are heavyweight and you can't just make a trillion of them just for them to be blocking waiting on IO.

If you use async approaches, your programming model suffers, but now you're multiplexing millions of tasks over a small number of platform threads of execution still without even straining the kernel's switching. You've essentially moved from kernel scheduling to user-mode scheduling by writing async code.

Virtual threads is a response to this, saying "what if you can eat your cake and have it, too?" by "simply" providing a user-mode-scheduled thread implementation. They took the general strategy that async programming was employing and "hoisted" it up a couple levels of abstraction, to be "behind" the threading model. Now you have all the benefits of just blocking the thread without all the problems that come from trying to have a ton of platform threads that will choke the linux kernel out.

riku_iki · on Sept 19, 2023

> couple hundred concurrent tasks, however many threads your particular linux kernel + hardware setup can context switch between before latency starts suffering.

and there any benchmarks saying it is couple hundred threads, and not 100k threads?..

couple hundred is about thread per core on modern CPUs..

Also, my belief is that JVM itself adds lots of overhead.

ye-olde-sysrq · on Sept 19, 2023

>Also, my belief is that JVM itself adds lots of overhead.

The JVM has ~nothing to do with the scheduling of platform threads

>and there any benchmarks saying it is couple hundred threads, and not 100k threads?..

It depends greatly on your hardware

>couple hundred is about thread per core on modern CPUs..

It depends on the hardware, of course

Overall, yes, you could probably run a lot of concurrent processes on a c7i.48xlarge in 2023. But why would you want to do that when, if you used virtual threads (or async), you could do the same work on an c7i.large? That's the whole point of this. There's no reason to waste CPU and memory on kernel context switching overhead.

riku_iki · on Sept 19, 2023

> The JVM has ~nothing to do with the scheduling of platform threads

I didn't say it schedules platform threads, but JVM has many other performance issues with its GC/object identity/lock monitoring model, which makes it harder choice for ultra-high performance apps, so virtual thread scheduling may improve your app perf by 5%, where other 95% stuck in other bottlenecks

> But why would you want to do that when, if you used virtual threads

because virtual threads are not supported in ecosystem well, and it will take another 10 years for it to catch up, before that most API/libs will use old concepts and you take large risks of fragmentation by introducing virtual threads to your app.

Also, you can avoid all switching with 20 years old ExecutorService, where you have exactly the same m:n mapping between tasks and machine threads.

ye-olde-sysrq · on Sept 19, 2023

>I didn't say it schedules platform threads, but JVM has many other performance issues with its GC/object identity/lock monitoring model, which makes it harder choice for ultra-high performance apps, so virtual thread scheduling may improve your app perf by 5%, where other 95% stuck in other bottlenecks

Switching from async to virtual threads is typically not an improvement to performance in well-factored async code. The primary benefit is a clearer programming model that's significantly easier to get right and less code to implement while still the same performance.

>Also, you can avoid all switching with 20 years old ExecutorService, where you have exactly the same m:n mapping between tasks and machine threads.

You misunderstand the point of virtual threads. There is no need to pool them, as (most) executor service implementations do. And the whole point is that you no longer need to avoid blocking them, which pretty much every currently-existing solution will be doing. There's 0 benefit to switching without also updating your code to take advantage of the new paradigm.

>because virtual threads are not supported in ecosystem well, and it will take another 10 years for it to catch up, before that most API/libs will use old concepts and you take large risks of fragmentation by introducing virtual threads to your app.

What risks? The whole point of virtual threads vs async/await is that they don't color functions in the same way that async does.

Also, pretty much every notable framework already has support for virtual threads (Spring, jetty, helidon, and many others). The API has been the same for a couple years at this point.

riku_iki · on Sept 19, 2023

sorry, I disagree with most of your points but not interested in continuing discussion.

kaba0 · on Sept 19, 2023

You can just swap a single line in your ExecutorService then for all the benefits.

riku_iki · on Sept 19, 2023

ExecutorService can be configured in many different ways, for example I often create one with boubned queue and CallerRunsPolicy, I somehow couldn't find equivalent virtual executor.

mr_tristan · on Sept 19, 2023

I can’t speak for the OP, but this makes it much easier to write code that uses threads that wait on IO, and just let the underlying system (VM + JDBC connectors, for example) handle the dirty work.

A few years ago, I wrote a load generation application using Kotlin’s coroutines - in this case, each coroutine would be a “device”. And I could add interesting modeling on each device; I easily ran 250k simulated devices within a single process, and it took me a couple of days. But coroutines are not totally simple; any method that might call IO needs to be made “coroutine aware”. So the abstraction kinda leaks all over the place.

Now, you can do the same thing in Java. Just simply model each device as its own Runnable and poof, you can spin up a million of them. And there isn’t much existing code that has to be rewritten. Pretty slick.

So this isn’t really a “high performance computing” feature, but a “blue collar coder” thing.

Svenskunganka · on Sept 19, 2023

It's worth mentioning that there are some aspects of the virtual thread implementation that is important to take into consideration before switching out the os-thread-per-task executor for the virtual-thread-per-task executor:

1. Use of synchronized keyword can pin the virtual thread to the carrier thread, resulting in the carrier thread being blocked and unable to drive any other virtual threads. Recommendation is to refactor to use ReentrantLock. This may be solved in the future though.

2. Use of ThreadLocals should be reduced/avoided since they're local to the virtual thread and not the carrier threads, which could result in balooning memory usage with extensive usage from many virtual threads.

samus · on Sept 19, 2023

Regarding the second point: there are now alternatives to ThreadLocals available that are intended to be used by virtual threads: Scoped Values. Unfortunately, they are preview-only in JDK 21.

kitd · on Sept 19, 2023

Pre-v21, Java's threads were 1:1 based on OS threads. HFT apps were normally single-threaded with exotic work-dispatching frameworks, native-memory buffers &| built-in network stacks so, while being undoubtedly fast, were not particularly representative.

V21 virtual threads are more like Go's goroutines. They map 1:m with OS threads, and the JVM is responsible for scheduling them, making them much less of a burden on the underlying OS, with fewer context switches, etc. And the best thing is, there has been minimal change in the Java standard library API, making them very accessible to existing devs and their codebases.

mongol · on Sept 19, 2023

Once upon a time, Java had "green threads". They were kind of like virtual threads, but the way I understood it was that they all mapped to one OS thread. While these new virtual threads map m:n to OS threads.

jabradoodle · on Sept 19, 2023

An interesting part of HFT is you normally do everything on a single thread. Your unit of parallelism would be a different JVM and you want to avoid context switching at all costs, going as far as to pin specific OS threads, making sure your thread for execution is never used by GC.

hn_throwaway_99 · on Sept 19, 2023

To understand the benefit of virtual threads, I think it's helpful to think of it as a "best of both worlds" situation between blocking IO and async IO. In summary, virtual threads give you the scalability (not simply "performance") benefits of async IO code while keeping the simplified developer experience of normal threads and blocking IO.

First, it's best to understand the benefit of virtual threads from a webserver. Usually, a webserver maps 1 request to 1 thread. However, most of the time the webserver actually doesn't run much code itself: it calls out to make DB requests, pulls files from disk, makes remote API requests, etc. With blocking IO, when a thread makes one of these remote calls, it just sits there and waits for the remote call to return. In the meantime, it holds on to a bunch of resources (e.g. memory) while it's sitting doing nothing. For something like HFT, that's normally not much of a problem because the goal isn't to server tons of independent incoming requests (sometimes, obviously the usage pattern can differ), but for a webserver, it can have a huge limiting effect on the number of concurrent requests that can be processed, hurting scalability.

Compare that to how NodeJS processes incoming web requests. With Node (and JS in general), there is just a single thread that processes incoming requests. However, with async IO in Node (which is really just syntactic sugar around promises and generators), when a request calls out to something like a DB, it doesn't block. Instead, the thread is then free to handle another incoming web request. When the original DB request returns, the underlying engine in Node essentially starts up that request from where it left off (if you want more info just search for "Node event loop"). Folks found that in real world scenarios that Node can actually scale extremely well to the number of incoming request, because lots of webserver code is essentially waiting around for remote IO requests to complete.

However, there are a couple of downsides to the async IO approach:

1. In Node, the main event loop is single threaded. So if you want to do some work that is heavily CPU intensive, until you make an IO call, the Node server isn't free to handle another incoming request. You can test this out with a busy wait loop in a Node request handler. If you have that loop run for, say, 10 seconds, then no other incoming requests can be dispatched for 10 seconds. In other words, Node doesn't allow for preemptive interruption.

2. While I generally like the async IO style of programming and I find it easy to reason about, some folks don't like it. In particular, it creates a "function coloring" problem: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-... . Async functions can basically only be called from other async functions if you want to do something with the return value.

Virtual threads then basically can provide the best features from both of these approaches:

1. From a programming perspective, it "feels" pretty much like you're just writing normal, blocking IO code. However, under the covers, when you make a remote call, the Java schedule will reuse that thread to do other useful work while the remote call is executing. Thus, you get greatly increased scalability for this type of code.

2. You don't have to worry about the function coloring problem. A "synchronous" function can call out to a remote function, and it doesn't need to change anything about its own function signature.

3. Virtual threads can be preemptively interrupted by the underlying scheduler, preventing a misbehaving piece of code from starving resources (I'm actually less sure of the details on this piece for Java).

Hope that helps!

Scarbutt · on Sept 19, 2023

1. In Node, the main event loop is single threaded. So if you want to do some work that is heavily CPU intensive, until you make an IO call, the Node server isn't free to handle another incoming request. You can test this out with a busy wait loop in a Node request handler. If you have that loop run for, say, 10 seconds, then no other incoming requests can be dispatched for 10 seconds. In other words, Node doesn't allow for preemptive interruption.

Nice. I will add that JS runtimes now have worker threads, while they are still terribly inefficient even when compared to OS threads they can alleviate this problem if you don't need to spawn more than number of cores worker threads. If you are using nodejs and that is not enough, welcome to the microservices world.