Hacker Newsnew | past | comments | ask | show | jobs | submit | antves's commentslogin

totally agree on the interface.

I agree that there is communication overhead between the agents, although so far it looks like they can communicate very effectively and efficiently. We’re also working on efficient ways to transfer more contextual information


I think our main differentiator is that we have been building browser agents for over a year and our technology is today 5x faster and 7x cheaper than browser-use, while also being significantly more reliable

definitely only your own traffic! will add to the docs for ease of mind

we benchmarked on webvoyager and have a few more benchmarks coming up (see https://docs.smooth.sh/performance), we’ll be publishing the full results shortly

we run headful browsers with a fingerprinting that is as stealth as possible and on top of that we can use your ip

anecdotally, making all requests originate from your own residential address has been a major success compared to other cloud-only solutions

it will be interesting to see how this will play out, having to “hide the agent” feels like a temporary work around until society accepts that agents actually do exist


it's interesting to see how things will play out, but I really believe that doing Claude Code (maybe with Opus 4.6) + click tool + move_mouse tool + snapshot page tool + another 114 more tools is definitely not the best approach

the main issue with this interface is that the commands are too low-level and that there is no way of controlling the context over time

once a snapshot is added to the context those tokens will take up very precious context window space, leading to context rot, higher cost, and higher latency

that's why agents need to use very large models for these kind of systems to work and, unfortunately, even then they're very slow, expensive, and less reliable than using a purpose-made system

I wonder if a standardized interface will organically emerge over time. At the moment SKILL.md + CLI seem to be the most broadly adopted interface - even more than MCP maybe


the Claude --chrome command has a few limitations:

1. it exposes low-level tools which make your agent interact directly with the browser which is extremely slow, VERY expensive, and less effective as the agent ends up dealing with UI mechanics instead of thinking about the higher-level goal/intents

2. it makes Claude operate the browser via screenshots and coordinates-based interaction, which does not work for tasks like data extraction where it needs to be able to attend to the whole page - the agent needs to repeatedly scroll and read one little screenshot at the time and it often misses critical context outside of the viewport. It also makes the task more difficult as the model has to figure out both what to do and how to do it, which means that you need to use larger models to make this paradigm actually work

3. because it uses your local browser, it also means that it has full access to your authenticated accounts by default which might not be ideal in a world where prompt-injections are only getting started

if you actively use the --chrome command we'd love to hear your experience!


I am sure they measured the difference but i am wondering why reading screenshots + coordinates is more efficient than selecting aria labels? https://github.com/Mic92/mics-skills/blob/main/skills/browse.... the JavaScript snippets should at least more reusable if you want semi-automate websites with memory files

thanks! it can actually test apps running on your localhost with our tunneling feature (https://docs.smooth.sh/features/use-my-ip)

you should be able to tell it to go to your localhost address and it should be able to navigate to your local app from the remote browser

let us know if you have any questions!


It still may not be quite ideal. For example, right now I was building a clone of Counter Strike. There's such large files that tunneling would be cumbersome.

ah fair point! the tunnel is peer-to-peer so it doesn't add too much overhead, but you can't really beat localhost on latency

I'd be curious to try and see if practically the difference is so little that it doesn't matter


But does the tunnel expose the low level APIs required for Claude Code to write end to end tests and check browser console errors, etc ?

indeed! there is no reason why tooling for AI agents shouldn't use AI when tooling for humans is shifting towards AI

yes that's right! I think this might be the way AI agents adoption plays out more broadly

Agents that start using subagents rather than humans using the subagents directly


Thanks for asking! We developed our browser agent that uses a mix of custom and frontier models for different parts of the system

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: