seanheelan's comments

seanheelan · 2025-05-25T13:33:45 1748180025

Hi, author here. Yes, I built a PoC. Yes, it triggered a KASAN report/crash.

stonepresto · 2025-05-25T13:42:35 1748180555

Thank you! I'm really happy to hear you did that. But why not mention that in your blog post? I understand not wanting to include a PoC for responsible disclosure reasons, but including it would have added a lot of credibility to your work for assholes like me lol

seanheelan · 2025-05-25T13:52:03 1748181123

I honestly hadn’t anticipated someone would think I hadn’t bothered to verify the vulnerability is real ;)

Since you’re interested: the bug is real but it is, I think, hard to exploit in real world scenarios. I haven’t tried. The timing you need to achieve is quite precise and tight. There are better bugs in ksmbd from an exploitation point of view. All of that is a bit of a “luxury problem” from the PoV of assessing progress in LLM capabilities at finding vulnerabilities though. We can worry about ranking bugs based on convenience for RCE once we can reliably find them at all.

stonepresto · 2025-05-25T14:11:18 1748182278

I'm too much of a skeptic to not do so lol. Great post though overall, don't let my assholery dissuade you! I was pleasantly surprised that it was actually a researcher behind the news story and there was some real evidence / scientific procedure. I thought you had a lot of good insights into how to use LLMs in the VR space specifically, and I'm glad you did benchmarking. It's interesting to see how they're improving.

Yeah race conditions like that are always tricky to make reliable. And yeah I do realize that the purpose of the writeup was more about the efficacy of using LLMs vs the bug itself, and I did get a lot out of that part, I just hyper-focused on the bug because it's what I tend to care the most about. In the end I agree with your conclusion, I believe LLMs are going to become a key part of the VR workflow as they improve and I'm grateful for folks like yourself documenting a way forward for their integration.

Anyways, solid writeup and really appreciate the follow-up!

seanheelan · 2025-05-25T09:21:48 1748164908

I realised I didn't mention it in the article, so in case you're curious it cost about $116 to run the 100k token version 100 times.

egorfine · 2025-05-26T09:21:55 1748251315

So, half that for batch processing [1], which presumably would be just fine for this task?

[1] https://platform.openai.com/docs/guides/batch

wyldfire · 2025-05-25T15:57:25 1748188645

How many years/generations behind o3 are the freely available / local models?

ramy_d · 2025-05-25T13:36:31 1748180191

thank you, I was going to ask about this. It's not a crazy amount...

Aachen · 2025-05-25T14:58:57 1748185137

Do we know how that relates to actual operating cost? My understanding is that this is below cost price because we're still in the investor hype part of the cycle where they're trying to capture market share by pumping many millions into these companies and projects

Does this really reflect the resource cost of finding this vulnerability?

remram · 2025-05-26T01:33:39 1748223219

It sounds like a crazy amount to me. I can run code analyzers/sanitizers/fuzzers on every commit to my repo at virtually no cost. Would they have caught a problem like this? Maybe not, certainly not without some amount of false positives. Still this LLM approach costs many millions of times more than previous tooling, and might still have brought up nothing (we just don't read the blog posts about those attempts).

seanheelan · 2025-05-25T09:07:31 1748164051

It's certainly not the first vulnerability found with an LLM =) Perhaps I should have been more clear though.

What the post says is "Understanding the vulnerability requires reasoning about concurrent connections to the server, and how they may share various objects in specific circumstances. o3 was able to comprehend this and spot a location where a particular object that is not referenced counted is freed while still being accessible by another thread. As far as I'm aware, this is the first public discussion of a vulnerability of that nature being found by a LLM."

The point I was trying to make is that, as far as I'm aware, this is the first public documentation of an LLM figuring out that sort of bug (non-trivial amount of code, bug results from concurrent access to shared resources). To me at least, this is an interesting marker of LLM progress.