Before using make sure you read this entirely and understand it:
https://docs.openclaw.ai/gateway/security
Most important sentence: "Note: sandboxing is opt-in. If sandbox mode is off"
Don't do that, turn sandbox on immediately.
Otherwise you are just installing an LLM controlled RCE.
There are still improvements to be made to the security aspects yet BIG KUDOS for working so hard on it at this stage and documenting it extensively!! I've explored Cursor security docs (with a big s cause it's so scattered) and it was nothing as good.
It's really easy to run this in a container. The upside is you get a lot of protection included. The downside is you're rebuilding the container to add binaries. The latter seems like a fair tradeoff.
What I'll say about OpenClaw is that it truly feels vibe coded, I say that in a negative context. It just doesn't feel well put together like OpenCode does. And it definitely doesn't handle context overruns as well. Ultimately I think the agent implementation in n8n is better done and provides far more safeguards and extensibility. But I get it - OpenClaw is supposed to run on your machine. For me, though, if I have an assistant/agent I want it to just live in those chat apps. At that rate it's running in a container on a VPS or LXC in my home lab. This is where a powerful-enough local machine does make sense and I can see why folks were buying Mac Minis for this. But, given the quality of the project, again in my opinion, it's nothing spectacular in terms of what it can do at this point. And in some cases it's more clunky given its UI compared to other options that exist which provide the same functionality.
The thing is running it onto your machine is kinda the point. These agents are meant to operate at the same level - and perhaps replace - your mail agent and file navigator. So if we sandbox too much we make it useless.
The compromise being having separate folders for AI, a bit like having a Dropbox folder on your machine with some subfolders being personal, shared, readonly etc.
Running terminal commands is usually just a bad idea though in this case, you'd want to disable that and instead fine tune a very well configured MCP server that runs the commands with a minimal blast radius.
> running it onto your machine is kinda the point.
That very much depends what you're using it for. If you're one of the overly advertised cases of someone who needs an ai to manage inbox, calendar and scheduling tasks, sure maybe that makes sense on your own machine if you aren't capable of setting up access on another one.
For anything else it has no need to be on your machine. Most things are cloud based these days, and granting read access to git repos, google docs, etc is trivial.
I really dont get the insane focus around 'your inbox' this whole thing has, that's perhaps the biggest waste of use you could have for a tool like this and an incredibly poor way of 'selling' it to people.
> someone who needs an ai to manage inbox, calendar and scheduling tasks
A secretary. The word you're looking for is "secretary". Having a secretary has always been the preferred way to handle these tasks for the wealthy and powerful. The president doesn't schedule his own meetings and manage his own Outlook calendar, a president/CEO/etc has better things to do.
People just created calendar/email/etc software (like Microsoft Outlook) to let us do it ourselves, because secretaries are $$$$. But let's be real, the ideal situation is having a perfect secretary to handle this crap. That's the point of using AI here: to have an AI secretary.
Managing your own calendar would become extremely 2010 coded, if AI secretaries become a thing. It'd be like how "rewinding your VCR tape" is 1990s coded.
Unless you're swamped with email I don't really get it. If someone calls me to arrange an appointment I say "Hey Google add x to calendar" after the call and it's done. Gemini can use Gmail and other workspace apps. You can also set up commands to do a few different things at once, like turning on the lights when you get home by saying I'm home. With any cheap set of bluetooth earphones this is all hands free.
Lots of these YouTubers are using openclaw to replace simple Google/Siri voice queries with something prohibitively complex, expensive and insecure.
Also, people in the 90's didn't have push notifications. We see emails on our watch/phone and can delete/archive/snooze from there. Email triage takes zero time these days and can be done from anywhere.
I do get it though if you're someone who is extremely busy and really needs a PA.
Much more likely that the average user is either unemployed or in the leisure class.
The sandbox opt-in default is the main gotcha though. Would be better if it defaulted to sandboxed with an explicit --no-sandbox flag for those who understand the risk
Related: https://news.ycombinator.com/item?id=46223311
Switching to paid version was already problematic as it uses GCP account system which doesn't have a spending limit, and API keys do not have an expiration date. So the free offer was great for freelancers and SMEs and yet the paid version was the worst possible scenario you can imagine for same freelancers and SMEs. OpenRouter + free models and the increased rate limit you get after buying 10 credits (10€) is my current favourite choice for learning/teaching.
Microsoft is using the deep penetration of SharePoint in companies to sell Copilot license. At least in France it's well and alive and I see much more Copilot licenses than actual OpenAI uses.
There's about a dozen workarounds around context limits, agents being one of them, MCP servers being another one, AGENTS.md being the third one, but none of them actually solve the issue of a context window being so small that it's useless for anything even remotely complex.
Let's imagine a codebase that can fit onto a revolutionary piece of technology known as a floppy drive. As we all know, a floppy drive can store <2 megabytes of storage. But a 100k tokens is only about 400 kilobytes. So, to process the whole codebase that can fit onto a floppy drive, you need 5 agents plus the sixth "parent process" that those 5 agents will report to.
Those five agents can report "no security issues found" in their own little chunk of the codebase to the parent process, and that parent process will still be none the wiser about how those different chunks interact with each other.
You can have an agent that focuses on studying the interactions. What you're saying is that an AI cannot find every security issue but neither do humans otherwise we wouldn't have security breaches in the first place. You are describing a relatively basic agentic setup mostly using your AI-assisted text editor but a commercial security bot is a much more complex beast hopefully. You replace context by memory and synthesis for instance, the same way our brain works.
In one instance it could not even describe why a test is bad unit test (asserting true is equal to true), which doesn’t even require context or multi file reasoning.
Its almost as if it has additional problems beyond the context limits :)
You may want to try using it, anecdotes often differ from theories, especially when they are being sold to you for profit. It takes maybe a few days to see a pattern of ignoring simple instructions even when context is clean. Or one prompt fixes one issue and causes new issues, rinse and repeat. It requires human guidance in practice.
Strongman: LLMs aren't a tool, they're fuzzy automation.
And what keeps security problems from making it into prod in the real world?
Code review, testing, static and dynamic code scanning, and fuzzing.
Why aren't these things done?
Because there isn't enough people-time and expertise.
So in order for LLMs to improve security, they need to be able to improve our ability to do one of: code review, testing, static and dynamic code scanning, and fuzzing.
It seems very unlikely those forms of automation won't be improved in the near future by even the dumbest form of LLMs.
And if you offered CISOs a "pay to scan" service that actually worked cross-language and -platform (in contrast to most "only supported languages" scanners), they'd jump at it.
And that buys you what, exactly? Your point is 100% correct and why LLMs are no where near able to manage / build complete simple systems and surely not complex ones.
Why? Context. LLMs, today, go off the rails fairly easily. As I've mentioned in prior comments I've been working a lot with different models and agentic coding systems. When a code base starts to approach 5k lines (building the entire codebase with an agent) things start to get very rough. First of all, the agent cannot wrap it's context (it has no brain) around the code in a complete way. Even when everything is very well documented as part of the build and outlined so the LLM has indicators of where to pull in code - it almost always cannot keep schemas, requirements, or patterns in line. I've had instances where APIs that were being developed were to follow a specific schema, should require specific tests and should abide by specific constraints for integration. Almost always, in that relatively small codebase, the agentic system gets something wrong - but because of sycophancy - it gleefully informs me all the work is done and everything is A-OK! The kicker here is that when you show it why / where it's wrong you're continuously in a loop of burning tokens trying to put that train back on the track. LLMs can't be efficient with new(ish) code bases because they're always having to go lookup new documentation and burning through more context beyond what it's targeting to build / update / refactor / etc.
So, sure. You can "call an LLM multiple times". But this is hugely missing the point with how these systems work. Because when you actually start to use them you'll find these issues almost immediately.
To add onto this, it is a characteristic of their design to statistically pick things that would be bad choices, because humans do too. It’s not more reliable than just taking a random person off the street of SF and giving them instructions on what to copy paste without any context. They might also change unrelated things or get sidetracked when they encounter friction. My point is that when you try to compensate by prompting repeatedly, you are just adding more chances for entropy to leak in — so I am agreeing with you.
> To add onto this, it is a characteristic of their design to statistically pick things that would be bad choices, because humans do too.
Spot on. If we look at, historically, "AI" (pre-LLM) the data sets were much more curated, cleaned and labeled. Look at CV, for example. Computer Vision is a prime example of how AI can easily go off the rails with respect to 1) garbage input data 2) biased input data. LLMs have these two as inputs in spades and in vast quantities. Has everyone forgotten about Google's classification of African American people in images [0]? Or, more hilariously - the fix [1]? Most people I talk to who are using LLMs think that the data being strung into these models has been fine tuned, hand picked, etc. In some cases for small models that were explicitly curated, sure. But in the context (no pun) of all the popular frontier models: no way in hell.
The one thing I'm really surprised nobody is talking about is the system prompt. Not in the manner of jailbreaking it or even extracting it. But I can't imagine that these system prompts aren't collecting mass tech debt at this point. I'm sure there's band aid after band aid of simple fixes to nudge the model in ever so different directions based on things that are, ultimately, out of the control of such a large culmination of random data. I can't wait to see how these long term issues crop and and duct taped for the quick fixes these tech behemoths are becoming known for.
Talking about the debt of a system prompt feels really weird. A system prompt tied to an LLM is the equivalent of crafting a new model in the pre-LLM era. You measure their success using various quality metrics. And you improve the system prompt progressively to raise these metrics. So it feels like bandaid but that's actually how it's supposed to work and totally equivalent to "fixing" a machine learning model by improving the dataset.
"You can investigate this yourself by putting a logging proxy between the claude code CLI and the Anthropic API using ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I never know which tool to favour for doing that when you're not a system or network expert.
agree - i've had claude one-shot this for me at least 10 times at this point cause i'm too lazy to lug whatever code around. literally made a new one this morning
I call that self-destructive prompting in the sense that you use AI to output programs that replace calling the AI in the future. The paper seems to indicate that this also brings much better results. However it's subject to attacks as running generated code is usually unsafe. A sandbox has to be used, major agentic AI players are providing some solutions, like Langchain sandbox released earlier this year.
If the generated code uses a suitable programming language, like the safe subset of Haskell, then the risk is significantly lower. Anyway it makes sense to execute this code in the user's browser instead of on the server.
Yeah I mean you can replace sandboxing buy other safe alternatives but the idea is the same, the generated code has to be considered as 100% untrusted. Supply chain attacks are especially nasty.
I am pretty sure you can figure massive loopholes like how it's legal to train the model on stolen data but not to steal data etc.
For instance advertisers can push model benchmarks that favours some opinions, based on a biased selection of research papers.
I think we've only seen the beginnings of what intricate business models can be figured for an AI company, it's much more convoluted than a search engine or even a social network.
"that could redefine the web economy" I don't think that ads in ChatGPT are that disruptive, it's just another channel. I think ChatGPT apps are an order magnitude more game changing, as they are not a new markting channel but a new distribution channel for software. Your next ad will still be an ad, but your next SaaS might be a ChatGPT App.
reply