Mintlify is really good. If you're a serious developer tool not sure why you wouldn't use it. For example I went into your docs and I don't see AI chat so I can ask quick, natural language questions. No MCP I can install so Cursor can query. Prob no llms.txt. No quick Copy to Markdown. This stuff is table stakes, if you don't have it and a competitor does, I'm not even considering you guys
It's just a worse developer experience. Fine if you aren't a serious business, but yeah I wouldn't play down the value of Mintlify or similar products. It's seriously good and it's why huge companies use it
>I don't see AI chat so I can ask quick, natural language questions. No MCP I can install so Cursor can query. Prob no llms.txt. No quick Copy to Markdown.
It's not the site's job to add those features though. If you want that experience there are ways to get it without adding bloat to every page on the web. Scraping a static site and answering questions/summarizing is a solved problem.
It is the sites job to make documentation available to the users, no?
It’s so odd for a tech focused crowed to be so opposed to newer technology.
Users are getting used to natural language search, not having it will be perceived as friction.
Users are increasingly turning to agentic coding tools, those tools do best when documentation is available via an MCP server. Not having one will make it harder for people to use your product.
I'm not opposed to the idea of natural language search, my point is the tools should be on the user side. Right now, I can ask questions about plain text pages that haven't been updated in 30 years directly in Firefox with no effort from the site operator. If an agent needs to have direct access to documentation, it's trivial for it to download pages autonomously (or even set up its own MCP server). There's literally no reason to demand that millions of sites independently add features that browsers and agents already have better and more uniform versions of.
It's not a business's job to make their documentation accessible to their potential and current customers?
I would ask if you've started a real business but it's clear you haven't. It is 100% on a developer tool startup to provide documentation that is easily accessible. If they don't, customers will struggle to get value. If you think this isn't true, then you are ignoring the gigantic market of companies purchasing documentation products (look at Mintlify's customer base for reference)
There is no way I'm asking my customers to scrape my docs and build their own MCP server and AI assistant just to access it easily.
It's barely a value-add when agents can scrape the docs themselves and browsers have the reading tools built in.
>I would ask if you've started a real business but it's clear you haven't.
I wouldn't speak so authoritatively about this stuff if I didn't know anything about running a business. My lemonade stand was extremely successful in the geographic area it was marketed in (~10 blocks around my house). I was planning on going public but unexpected regulatory issues (end of 4th grade summer break) forced me to reevaluate. Though these new agentic lemon beverage developments seem like they might draw me back into it...
Our agent would have a tool to essentially bring in the human. Not built this yet, but the closest thing we do have is that our agent can declare a task as failed if it determines it can’t proceed (based on your instructions).
More on this soon! How would you imagine this would be useful?
Thanks. In the OpenDental example, if the task is to update a different patient, is it falling back to computer use because the “search results” are in different places? I.e search for “John” there may be two results. John and Johnson.
Yup, this is a case where you always want an agent to do that step. So in the prompt you just say “do a focused_action to select the search result with John”, and then the pathfinder agent will cache in it’s memory to delegate that step to a mini computer use agent, just for that particular task.
After the focused action is done, it’ll go right back to deterministic!
Unfortunately these scripting tools just are untenable when dealing with so many desktop flows that all have changing UIs and random popups. You end up having to repair all of them all the time, in fact there's a whole consulting industry out there just to do this all day.
The whole idea of Cyberdesk is the prompt is the source of truth, and then once you learn a task once via CUA, the system follows that cache most of the time until you have to fall back to CUA, which follows the prompt. And that anomaly is also cached too.
So over time, the system just learns, and gets cheaper and faster.
I used AutoIT to remove old AV from roughly 6000 PCs across 20 odd countries back in 2002. I still use it from Zenworks on some customer sites, 20+ years later.
Old school Windows apps are not "flowing" they generally use a toolkit and AutoIT is able to use the Windows APIs to note window handles, or the text in various widgets and so on and act on them.
These are not complicated beasts - they are largely determinant. If you have to go off piste and deal with moguls, you have a mode called "adlib" where you deal with unusual cases.
I find it a bit unpleasant that you describe part of my job as "untenable". I'm sure you didn't mean it as such. I'm still just as cheap as I was 20 years ago and probably a bit quicker now too!
Yup, a few of our clients have a need to verify something in the software, so we support an agentic step where we look at the screen and can verify whether something exists, or whatever a step was completed, etc!
Thanks! And yes, so our pathfinder agents utilize Sonnet 4's precise coordinate generation capabilities. You give it a screenshot, give it a task, and it can output exact coordinates of where to click on an input field, for example.
And yes we've found the computer use models are quite reliable.
Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home). If not, our system can complete an entire task end to end, on the order of less than $0.0001.
So it's a hybrid system at the end of the day. This results in really low costs at scale, as well as speed and reliability improvements (since in the happy path, we run exactly what has worked before).
Thanks for the reply. I lost you in this part,
Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home)
I assume you send screenshot to claude for nest action to take, how are you able to reduce this exact step by working deterministically? What is the is deterministic part and how you figure it out?
So what I meant is this:
When you run our Cyberdesk agent the first time, it runs with the computer use agent. But then once that’s complete, we cache every exact step it took to successfully complete that task (every click, type, scroll) and then simply replay that the next time.
But during that replayed action, we do bring in smaller LLMs to just keep in check to see if anything unexpected happened (like a popup). If so, we fall back to computer use to take it home.
Does that make sense? At the end of the day, our agent compiles down to Pyautogui, with smart fallback to the agent if needed.
There isn't a viable computer use model that can be ran locally yet unfortunately. Am extremely excited for the day that happens though. Essentially the key capability that makes a model a computer use model is precise coordinate generation.
So if you come across a local model that can do that well, let us know! We're also keeping a close watch.
You are correct in that ByteDance did releas UI-TARS which sounds like a really good open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
I don't know too much about training your own computer use model other than it would probably be a very hefty, very expensive task.
However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
Good point. In a way we can verify to a customer that we have that policy set up with them by showing them the certificate. But you are correct in that we haven't gone as far as asking for proof from Anthropic or OpenAI on not retaining any of our data but what we did do is we got their SOC 2 Type II reports and they showed no significant security vulnerabilities that will impact our usage of their service. So now we have been operating under the assumption that they are honoring our signed agreement within the context of the SOC 2 Type II report we retrieved, and our customers have been okay with that. But we are definitely open to pursuing that kind of proof at some point.
All of which has nothing to do with OpenAI or Anthropic deciding to use your data??? SOC 2 Type II is completely irrelevant.
You've got two companies that basically built their entire business upon stealing people's content, and they've given you a piece of paper saying "trust me bro".
I appreciate your skepticism. At the end of the day we're focused on delivering real value while taking every security precaution we can reasonably take and build new technology at the same time. Eventually as we grow we'll be able to do full self hosting for our customers and perhaps even spin up our own LLMs in our own servers. But until then, we can only do so much.
Welcome to the invalidated EU-US Safe Harbour, the invalidated EU-US Privacy Shield, and the soon-to-be invalidated EU-US Data Privacy Framework (DPF) and Transatlantic Data Privacy Framework (TADPF).
Digital sovereignty and respect for privacy and local laws are the exception in this domain, not the expectation.
As Max Schrems puts it "Instead of stable legal limitations, the EU agreed to executive promises that can be overturned in seconds. Now that the first Trump waves hit this deal, it quickly throws many EU businesses into a legal limbo."
After recently terrifying the EU with the truth in an ill-advised blogpost, Microsoft are now attempting the concept of a 'Sovereign Public Cloud' with a supposedly transparent and indelible access-log service called Data Guardian.
If Nation States can't manage to keep their grubby hands off your data, private US Companies obliged to co-operate with Intelligence Apparatus certainly won't be.
You make valid points. At the end of the day we're focused on delivering real value while taking every security precaution we can reasonably take and build new technology at the same time. Eventually as we grow we'll be able to do full self hosting for our customers and perhaps even spin up our own LLMs in our own servers. But until then, we can only do so much.
Typically with this sort of thing the way it really works is that you, the startup, use a service provider (like OpenAI) who publish their own external audit reports (like a SOC 2 Type 2) and then the SOC 2 auditors will see that the service provider company has a policy related to how it handles customer data for customers covered by Agreement XYZ, and require evidence to prove that the service provider company is following its policies related to not using that data for undeclared purposes or whatever else.
Audit rights are all about who has the most power in a given situation. Just like very few customers are big enough to go to AWS and say "let us audit you", you're not going to get that right with a vendor like Anthropic or OpenAI unless you're certifiably huge, and even then it will come with lots of caveats. Instead, you trust the audit results they publish and implicitly are trusting the auditors they hire.
Whether that is sufficient level of trust is really up to the customer buying the service. There's a reason many companies sell on-prem hosted solutions or even support airgapped deployments, because no level of external trust is quite enough. But for many other companies and industries, some level of trust in a reputable auditor is acceptable.
Thanks for the breakdown Seth! We did indeed get their SOC 2 Type II reports and made sure they showed no significant security vulnerabilities that will impact our usage of their service.
Right now we are taking the policies we signed with our LLM vendors as a verification of a zero data retention policy. We did also get their SOC 2 Type II reports and they showed no significant security vulnerabilities that will impact our usage of their service. We're doing our best to deliver value while taking as many security precautions as possible: our own data retention policy, encrypting data at rest and in transit, row-level security, SOC 2 Type I and HIPAA compliance (in observation for Type II), secret managers. We have other measures we plan to take like de-identifying screenshots before sending them up. Would love to get your thoughts on any other security measures you would recommend!
I hard agree. Seth, Ayo, I think you guys should be honest with yourself and ask whether you want to go toe to toe with Cursor, Windsurf, Microsoft, etc.
If not, take the other route. Go deep into the vertical of React Native. Help people with no experience run an ENTIRE BUSINESS just with your chatbot as the AI cofounder - you build the entire app and backend, handle marketing, publish it, all with AI agents. How sick would that be.
Yeah, we definitely don't want to position ourselves as an IDE or pure code-gen product long term. What you mentioned with the React Native vertical really speaks to us, we've been indie app developers for a while and think there's lots of opportunities to bring AI to that field that go beyond the code.
It's just a worse developer experience. Fine if you aren't a serious business, but yeah I wouldn't play down the value of Mintlify or similar products. It's seriously good and it's why huge companies use it