retr0reg's comments

retr0reg · 2025-10-02T16:19:53 1759421993

I work in a ML security R&D startup called Pwno, we been working on specifically putting LLMs into memory security for the past year, we've spoken at Black Hat, and we worked with GGML (llama.cpp) on providing a continuous memory security solution by multi-agents LLMs.

Somethings we learnt alone the way, is that when it comes to specifically this field of security what we called low-level security (memory security etc.), validation and debugging had became more important than vulnerability discovery itself because of hallucinations.

From our trial-and-errors (trying validator architecture, security research methodology e.g., reverse taint propagation), it seems like the only way out of this problem is through designing a LLM-native interactive environment for LLMs, validate their findings of themselves through interactions of the environment or the component. The reason why web security oriented companies like XBOW are doing very well, is because how easy it is to validate. I seen XBOW's LLM trace at Black Hat this year, all the tools they used and pretty much need is curl. For web security, abstraction of backend is limited to a certain level that you send a request, it whether works or you easily know why it didn't (XSS, SQLi, IDOR). But for low-level security (memory security), the entropy of dealing with UAF, OOBs is at another level. There are certain things that you just can't tell by looking at the source but need you to look at a particular program state (heap allocation (which depends on glibc version), stack structure, register states...), and this ReACT'ing process with debuggers to construct a PoC/Exploit is what been a pain-in-the-ass. (LLMs and tool callings are specifically bad at these strategic stateful task, see Deepmind's Tree-of thoughts paper discussing this issue) The way I've seen Google Project Zero & Deepmind's Big Sleep mitigating this is through GDB scripts, but that's limited to a certain complexity of program state.

When I was working on our integration with GGML, spending around two weeks on context, tool engineering can already lead us to very impressive findings (OOBs); but that problem of hallucination scales more and more with how many "runs" of our agentic framework; because we're monitoring on llama.cpp's main branch commits, every commits will trigger a internal multi-agent run on our end and each usually takes around 1 hours and hundreds of agent recursions. Sometime at the end of the day we would have 30 really really convincing and in-depth reports on OOBs, UAFs. But because how costly to just validate one (from understanding to debugging, PoC writing...) and hallucinations, (and it is really expensive for each run) we had to stop the project for a bit and focus solving the agentic validation problem first.

I think when the environment gets more and more complex, interactions with the environment, and learning from these interactions will matters more and more.

bwfan123 · 2025-10-02T17:47:37 1759427257

> I think when the environment gets more and more complex, interactions with the environment, and learning from these interactions will matters more and more

Thanks for sharing your experience ! It correlates with this recent interview with Sutton [1]. That real intelligence is learning from feedback with a complex and ever changing environment. What an LLM does is to train on a snapshot of what has been said about that environment and operate on only on that snapshot.

[1] https://www.dwarkesh.com/p/richard-sutton

retr0reg · 2025-09-03T15:14:38 1756912478

This high dimensional vector space does allows them to theoretically embed a interpretable "deeper understanding", reference https://www.youtube.com/watch?v=wjZofJX0v4M&t=624s. It's the pre-train process that determines whether or not they demonstrate this "deeper understanding".

This "deeper understanding" refers to the understanding to the word, can be contextual, pragmatic, or semantic.

retr0reg · 2025-06-30T05:09:17 1751260157

Really interesting perspectives on AIs, it frustrates me how I never really thought of these collateral effects on the rapid innovation of LLMs/AI.

It's refreshing to see these counternarrative perspective.

retr0reg · 2025-03-26T14:52:35 1743000755

In fact the llama.cpp codebase is well-developed and actively maintained. It has undergone iterative security hardening, intensive low-level security checks have been implemented in both the core inference engine and RPC components.

This standard of security is what made the exploitation such challenging and rewarding.

vlovich123 · 2025-03-26T15:45:44 1743003944

It’s actively maintained but I wouldn’t classify it as a clean codebase. Neither the abstractions it has within ggml, the structure of llama.cpp, effective use of modern c++ etc. it can’t even really make up its mind as to whether it should be c++ or c and there’s a lot of dirt because of that. Heck instead of using a submodule they’re copying ggml between projects making it very difficult to keep track of what’s actually happening where and what the ground truth is. It’s sloppy engineering. Parts are better designed for sure.

None of that is meant to take away from your effort or the success of llama.cpp, but I have spent quite a bit of time reading and working with the internals across layers and have a good eye for quality c++ patterns.

PartiallyTyped · 2025-03-26T15:10:59 1743001859

Thanks for the writeup! Was a very interesting read! I've subscribed and I am looking forward to your next exploits! ^_^

retr0reg · 2025-03-25T02:29:46 1742869786

it can be both because of the unsuccessful leak / wrong `libggml-base` offset. We're building a fake `ggml_backend_buffer` table from the leaked base + offset (the hard-coded offset of `libggml-base` should be adjusted with the compiled release) However this exploitation is not actually `libggml-base` version dependent, the partial-writing space is always one byte, and you can leak the `libggml-base` version with after a successful leak if you build every release's `libggml-base`, and map the last-two-bytes with each version.

I am happy you read it and liked it; more glad you tried it yourself :D