Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You don't blame postgres for letting you execute DROP TABLE.

Yep, I blame the agent for executing it.



I blame the user for accepting the tool call.


I mean, you do you, but I don't hear people shouting from the rooftops about their agent that they constantly babysit. If I have to accept any tool calls then I really can't just let the agent loose for even ostensibly mundane tasks like reading a support ticket because the support ticket could contain instructions to DROP TABLE so my agent suggests that and waits around doing nothing after I prompted it and moved on to something else.

It's just kind of laughable to suggest it's fine so long as you make sure to neither automate it nor use it with live data. Those things are the whole point.


You can use it with live data if you give it read access to prod and write access only to internal channels (whatever that may be, the point is it doesn’t have the ability to leak data to the outside world).

There are plenty of ways to sandbox things for a particular use case.

LLMs are still incredibly useful under these constraints.


> give it read access to prod and write access only to internal channels

Can you expand on what you mean by this? If one LLM reads untrusted data then the output from that LLM can't be trusted by other LLMs. (Presume the untrusted data contains instructions to do bad stuff in whatever way is convincing to every LLM in the loop that needs to be convinced.) It seems that it's not possible to separate the data while also using it in a meaningful way, especially given the whole point of an MCP server is to automate agents.

I agree that LLMs are useful but until LLM architecture makes prompt injections impossible, I don't see how an agent can possibly be secure to this, nor do I see how it helps to blame the user. The real problem with them is that they will decide what to do based on untrusted input. A program that has its own workflow but uses LLMs can have pretty much the same benefit without introducing the problem that a support ticket can tell it to exfiltrate data or delete data or whatever, simply because that workflow is more specialized in what it does.


I mean that you should only give the LLM the same privilege as whoever the source of your untrusted input is has. For the support agent example, it should only be able to access records related to user it's talking to, and only be able to do mutations that the user has permissions to do anyways. Though I've hated all the chatbot support "agents" I've interacted with so far, so please don't actually do this unless you have some secret sauce to make it not horrible.

I agree that for most tasks a pre-defined workflow with task specific LLM calls sprinkled in is a much better fit.

However, I really like agents with tool use for personal use (both programming and otherwise). In that case, the agent is either sandboxed or I approve any tools with the potential to do damage.

For the example of the Supabase MCP, it still seems pretty useful when limited to a test environment or read-only access to prod - it's a dev tool. Since it's a dev tool, the user needs to actually know what its doing. If they have no clue but are still running it on prod data, they have no business touching it or frankly any other dev tool. I class this as the same ignorance that leads people to storing passwords in plaintext.

So, I blame the developer for trying to use an MCP server 1) when they have no idea wtf it does and 2) in an environment that can affect real users who aren't aware of the incompetence of the dev whose service they're using. Likewise, in TFA, I blame the dev, not the tool. Unfortunately, no matter how you do it, lowering the barrier of entry for development while still providing access to ample footguns will always result in cases like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: