I find it somewhat ironic that you pitch this as "No callbacks. No promises. No async/await keywords. Just Ruby code that scales."
When you literally show in the example right above that you need both an "async do" and a "end.map(&:wait)".
I'll add - the one compelling argument you make about needing a db connection per worker is mitigated with something like pgbouncer without much work. The OS overhead per thread (or hell, even per process: https://jacob.gold/posts/serving-200-million-requests-with-c...) isn't an argument I really buy, especially given your use case is long running llm chat tasks as stated above.
Personally - if I really want to be fast and efficient I'm not picking Ruby anyways (or python for that matter - but at least python has the huge ecosystem for the LLM/AI space right now).
Fair point on the syntax, I should have been clearer. What I meant is that your existing Ruby code doesn't need modifications. In Python you'd need to use a different HTTP library, add `async def` and `await` everywhere, etc. In Ruby the same `Net::HTTP` call works in both sync and async context.
The `Async do` wrapper just at the orchestration level, not throughout your codebase. That's a huge difference in practice.
Regarding pgbouncer - yes, it helps with connection pooling, but you still have the fundamental issue of 25 workers = 25 max concurrent LLM streams. Your 26th user waits. With fibers, you can handle thousands on the same hardware because they yield during the 30-60s of waiting for tokens.
Sure, for pure performance you'd pick another language. But that's not the point - the point is that you can get much better performance for IO-bound workloads in Ruby today, without switching languages or rewriting everything.
It's about making Ruby better at what it's already being used for, not competing with system languages.
> Regarding pgbouncer - yes, it helps with connection pooling, but you still have the fundamental issue of 25 workers = 25 max concurrent LLM streams.
I guess my point is why are you picking an arbitrarily low number like 25? If you know that workers are going to be "waiting for tokens" most of the time, why not bump that number way, WAY up?
And I guess I should clarify - I'm coming into this outside of the Python space (I touch python because it's hard to avoid when doing AI work right now, but it's hardly my favorite language). Basically - having done a lot of GoLang, which uses goroutines in basically the same way Ruby uses Fibers (lightweight runtime managed thread replacements) I'll tell you up front - The orchestration level still matters a LOT, and you're going to be dealing with a lot of complexity there to make things work, even if it does mean that some lower level code can remain unaware (colorless).
Even good ol' fashioned c++ has had this concept bouncing around for a long time ( https://github.com/boostorg/fiber ). It's good at some things, but it's absolutely not the silver bullet I feel like you're trying to pitch it as here.
Why not bump it to 10,000 threads? The post shows: the OS scheduler struggles badly, 18x slower allocation, 17x slower context switching. That’s measured overhead, not theory.
Complexity? We migrated in 30 minutes. It’s just Async blocks, not goroutine scheduling gymnastics.
Not claiming it’s a silver bullet - the post explicitly says “use threads for CPU work”. But for I/O-bound LLM streaming, the massive improvement is real and in production.
> Personally - if I really want to be fast and efficient I'm not picking Ruby anyways (or python for that matter - but at least python has the huge ecosystem for the LLM/AI space right now
"Fast and efficient" can mean almost anything. You can be fast and efficient in Ruby at handling thousands of concurrent llm chats (or other IO-bound work), as per the article. You can also be fast and efficient at CPU-bound work (it's possible to enjoy Ruby while keeping in mind how it will translate into C). You probably cannot be fast and efficient at micro-managing memory allocations in Ruby. If you're ok to brush ruby aside over a vague generalization, maybe you just don't see its appeal in the first place, which is fair, but that makes the other reasons you provide kind of moot.
I find it somewhat ironic that you pitch this as "No callbacks. No promises. No async/await keywords. Just Ruby code that scales."
When you literally show in the example right above that you need both an "async do" and a "end.map(&:wait)".
I'll add - the one compelling argument you make about needing a db connection per worker is mitigated with something like pgbouncer without much work. The OS overhead per thread (or hell, even per process: https://jacob.gold/posts/serving-200-million-requests-with-c...) isn't an argument I really buy, especially given your use case is long running llm chat tasks as stated above.
Personally - if I really want to be fast and efficient I'm not picking Ruby anyways (or python for that matter - but at least python has the huge ecosystem for the LLM/AI space right now).