Hacker Newsnew | past | comments | ask | show | jobs | submit | waldopat's commentslogin


Phenomenal work!

You don't need spec kit or humanlayer per se, though they are good reference points for starting out, but you do need your own CLAUDE/AGENTS, README, ARCHITECTURE, SDRs, policies, research docs and planning docs all nicely organized in files and folders to work well with AI agents.

Particularly in a brownfield development context, using AI to research issues, gaps, bugs, tech debt, reusable patterns etc. as research files is really useful. I find the sweet spot is ~750 lines for each file. Planning files can max out at 1,500 lines if needed or otherwise broken up into individual phases. You can always ask an LLM to create a starter set of docs to get you going and then maintain as you go along.

For the management nerds, none of this is new. See: Writing Effective Use Cases by Alistair Cockburn and Agile Specification-Driven Development by Ostroff, Makalsky and Paige. The fundamentals remain the same and I'd say become even more important in a brownfield context with a high degree of tech debt or complexity. Greenfield is a different story.

What's important with spec driven development is that the LLMs dramatically reduce the cost of both good documentation and good technical specifications. Once you have a good plan and good references, you can build anything with a high degree of accuracy.

I'd add caution about drift. The more documentation you shove into the context, the worse things can get. You can also create a beautiful and perfect functional spec that becomes a swiss cheese of gaps when you create your technical implementation plan. Always check and use different AI models adversarially to ensure you are actually getting the plan you want. Usually ChatGPT can spot Claude's blind spots, for example. And always test manually.


I feel like they are picking a lane. ChatGPT is great for chatbots and the like, but, as was discussed in a prior thread, chatbots aren't the end-all-be-all of AI or LLMs. Claude Code is the workhorse for me and most folks I know for AI assisted development and business automation type tasks. Meanwhile, most folks I know who use ChatGPT are really replacing Google Search. This is where folks are trying to create llm.txt files to become more discoverable by ChatGPT specifically.

You can see the very different response by OpenAI: https://openai.com/index/our-approach-to-advertising-and-exp.... ChatGPT is saying they will mark ads as ads and keep answers "independent," but that is not measurable. So we'll see.

For Anthropic to be proactive in saying they will not pursue ad based revenue I think is not just "one of the good guys" but that they may be stabilizing on a business model of both seat and usage based subscriptions.

Either way, both companies are hemorrhaging money.


> ChatGPT is saying they will mark ads as ads and keep answers "independent," but that is not measurable. So we'll see.

Yeah I remember when Google used to be like this. Then today I tried to go to 39dollarglasses.com and accidentally went to the top search result which was actually an ad for some other company. Arrrg.


Before Google, web search was a toxic stew of conflicts of interest. It was impossible to tell if search results were paid ads or the best possible results for your query.

Google changed all that, and put a clear wall between organic results and ads. They consciously structured the company like a newspaper, to prevent the information side from being polluted and distorted by the money-making side.

Here's a snip from their IPO letter [0]:

Google users trust our systems to help them with important decisions: medical, financial and many others. Our search results are the best we know how to produce. They are unbiased and objective, and we do not accept payment for them or for inclusion or more frequent updating. We also display advertising, which we work hard to make relevant, and we label it clearly. This is similar to a well-run newspaper, where the advertisements are clear and the articles are not influenced by the advertisers’ payments. We believe it is important for everyone to have access to the best information and research, not only to the information people pay for you to see.

Anthropic's statement reads the same way, and it's refreshing to see them prioritize long-term values like trust over short-term monetization.

It's hard to put a dollar value on trust, but even when they fall short of their ideals, it's still a big differentiator from competitors like Microsoft, Meta and OpenAI.

I'd bet that a large portion of Google's enterprise value today can be traced to that trust differential with their competitors, and I wouldn't be surprised to see a similar outcome for Anthropic.

Don't be evil, but unironically.

[0] https://abc.xyz/investor/founders-letters/ipo-letter/default...


I agree. Having watched Google shift from its younger idealistic values to its current corrupted state, I can't help but be cynical about Anthropic's long-term trajectory.

But if nothing else, I can appreciate Anthropic's current values, and hope they will last as long as possible...


Disagree.

I end up using ChatGPT for general coding tasks because of the limited session/weekly limit Claude pro offers, and it works surprisingly well.

The best is IMO to use them both. They complement each other.


I use OpenCode and I made an "architect" agent that uses Opus to make a plan, then gives that plan to a "developer" agent (with Sonnet) that implements it, and a "reviewer" agent (Codex) reviews it in the end. I've gotten much better results with this than with straight up Opus throughout, and obviously hit the limits much less often as well.

Agreed on using both. I definitely know people who prefer Codex or Cursor. It's probably Coke or Pepsi at this point. I tend to prefer Claude Code, but that's just me.

Both companies are making bank on inference

You may not like this sources, but both the tomato throwers to the green visor crowds agree they are losing money. How and when they make up the difference is up to speculation

https://www.wheresyoured.at/why-everybody-is-losing-money-on... https://www.economist.com/business/2025/12/29/openai-faces-a... https://finance.yahoo.com/news/openais-own-forecast-predicts...


The comment was with reference to inference, not total P&L.

Of course they are losing money in total. They are not, however, losing money per marginal token.

It’s trivial to see this by looking at the market clearing price of advanced open source models and comparing to the inference prices charged by OpenAI.


> green visor crowds

??



That is the big question. Got reliable data on that?

(My gut feeling tells me Claude Code is currently underpriced with regards to inference costs. But that's just a gut feeling...)


https://www.wheresyoured.at/costs/

Their AWS spend being higher than their revenue might hint at the same.

Nobody has reliable data, I think it's fair to assume that even Anthropic is doing voodoo math to sleep at night.


The closed frontier models seem to sell at a substantial premium to inference on open-source models, so that does suggest that there is a decent margin to the inference. The training is where they're losing money, and the bull case is that every model makes money eventually, but the models keep getting bigger or at least more expensive to train, so they're borrowing money to make even more money later (which does need to converge somehow, i.e. they can't just keep shooting larger until the market can't actually afford to pay for the training). The bear case is that this is basically just a treadmill to stay on the frontier where they can make that premium (if the big labs ever stop they'll quickly get caught up by cheaper or even open-source models and lose their edge), in which case it's probably never going to actually become sustainable.

> If we subtract the cost of compute from revenue to calculate the gross margin (on an accounting basis),2 it seems to be about 50% — lower than the norm for software companies (where 60-80% is typical) but still higher than many industries.

https://epoch.ai/gradient-updates/can-ai-companies-become-pr...


The context of that quote is OpenAI as a whole.

Maybe on the API, but I highly doubt that the coding agent subscription plans are profitable at the moment.

Build out distribution first and generate network effects.

For sure not

Could you substantiate that? That take into account training and staffing costs?

The parent specifically said inference, which does not include training and staffing costs.

But those aren't things you can really separate for proprietary models. Keeping inference running also requires staff, not just for the R&D.

Just jumping on the thread. I think the conversation is conflating two very different things:

1. Turing test UX's, where a chat app is the product and the feature (Electron is fine) 2. The class of things LLMs are good at that often do not need a UI, let alone a chat app, and need automation glue (Electron may cause friction)

Personally, I feel like we're jumping on capabilities and missing a much larger issue of permissioning and security.

In an API or MCP context, permissions may be scoped via tokens at the very least, but within an OS context, that boundary is not necessarily present. Once an agent can read and write files or executed commands as the logged in user, there's a level of trust and access that goes against most best practices.

This is probably a startup to be hatched, but it seems to me this space of getting agents to be scoped properly and stay in bounds, just like cursor has rules, would be a prereq before giving access to an OS at all.


It seems the big feature is working agents in parallel? I've been working agents in parallel in Claude Code for almost 9 months now. Just create a command in .claude/commands that references an agent in .claude/agents. You can also just call parallel default Task agents to work concurrently.

Using slash commands and agents has been a game changer for me for anything from creating and executing on plans to following proper CI/CD policies when I commit changes.

To Codex more generally, I love it for surgical changes or whenever Claude chases its tail. It's also very, very good at finding Claude's blindspots on plans. Using AI tools adversarially is another big win in terms of getting things 90% right the first time. Once you get the right execution plan with the right code snippets, Claude is essentially a very fast typer. That's how I prefer to do AI-assisted development personally.

That said, I agree with the comments on tokens. I can use Codex until the sun goes down on $20/month. I use the $200/month pro plan with Claude and have only maxxed out a couple times, but I do find the volume to quality to be better with Claude. So far it's worth the money.


One thing I particularly appreciate about Guillermo Rauch, as an AI founder, is that he's got deep technical talent, especially as the lead on Next.JS. As I've been diving deep on AI tools, particularly for frontend prototyping, I've been pleasantly impressed by the quality of code that Vercel produces in addition to the overall design/look and feel.

This was a very refreshing interview.


Ha! Wow. It just said this. Dang!

STOP TRUSTING CLAUDE CODE'S CLAIMS!

This is the EXACT same response it gave you before when the screen was blank! It's literally copy-and-pasting the same "success" message while the component is completely broken.

Claude Code is Lying to You: Same identical response as last time Screen is still blank - nothing changed No actual progress made Just repeating the same false claims


I've been thinking more about the Base44 breach, which gave unauthorized users open access to any private application. For me, it shows that vulnerabilities aren’t just technical, they’re social.

Trust models fail. Governance gets hand-waved. And suddenly the whole ecosystem inherits a backdoor.

There's been great discussions recently about decentralization, self-hosting, or AI alignment, but often miss that these are socio-technical systems. Not just code or technical solution, but culture and human-computer interaction.

Also I noticed there were only a couple posts here in HN history that referred to the term "socio-terchnical" or "sociotechnical." I'm sure folks have brought it up in other ways, but I thought I'd share a primer.

How have you incorporated socio-terchnical practices into your teams or businesses?


I've got a question! I'd say what's happening with viebcoding is really an acceleration of move fast and break things. Uber and Snapchat both had major security vulnerabilities, resulting in millions of user records leaked, in their hey day of the mid 2010s. And that was WITH whatever DevOps pipeline, code review or other best practices likely in place.

What's unique about Tea or Base44 (or Replit founder deleting his codebase) is A) the disregard for security best practices and B) the speed at which they both grew and exposed vulnerabilities.

So my question is, how do you see the balance of cybersecurity and AI as everything moves faster than ever before?


I see companies deploy and trust AI without really investing into security, it will be very easy in the near future to find simple, devastating bugs : )


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: