More

theappsecguy · 2026-02-11T19:06:08 1770836768

Everyone and I mean everyone keeps parroting this "inflection point" marketing hype, which is so damn tiring.

minimaxir · 2026-02-11T19:16:52 1770837412

Believe me, I wish it was just parroting.

The real annoying thing about Opus 4.5 is that it's impossible to publicly say "Opus 4.5 is an order of magnitude better than coding LLMs released just months before it" without sounding like a AI hype booster clickbaiting, but it's the counterintuitive truth, to my personal frustration.

I have been trying to break this damn model since its November release by giving it complex and seemingly impossible coding tasks but this asshole keeps doing them correctly. GPT-5.3-Codex has been the same relative to GPT-5.2-Codex, which just makes me even more frustrated.

Denzel · 2026-02-11T22:42:48 1770849768

Weird, I broke Opus 4.5 pretty easily by giving some code, a build system, and integration tests that demonstrate the bug.

CC confidently iterated until it discovered the issue. CC confidently communicated exactly what the bug was, a detailed step-by-step deep dive into all the sections of the code that contributed to it. CC confidently suggested a fix that it then implemented. CC declared victory after 10 minutes!

The bug was still there.

I’m willing to admit I might be “holding it wrong”. I’ve had some successes and failures.

It’s all very impressive, but I still have yet to see how people are consistently getting CC to work for hours on end to produce good work. That still feels far fetched to me.

toraway · 2026-02-11T23:42:31 1770853351

Wait, are you really saying you have never had Opus 4.5 fail at a programming task you've given it? That strains credulity somewhat... and would certainly contribute to people believing you're exaggerating/hyping up Opus 4.5 beyond what can be reasonably supported.

Also, "order of magnitude better" is such plainly obvious exaggeration it does call your objectivity into question about Opus 4.5 vs. previous models and/or the competition.

minimaxir · 2026-02-12T00:27:22 1770856042

Opus 4.5 does made mistakes but I've found that's more due to ambiguous/imprecise functional requirements on my end rather than an inherent flaw of the agent pipeline. Giving it more clear instructions to reduce said ambiguity almost always fixes it, so I do not consider Opus failing. One of the very few times Opus 4.5 got completely stuck was, after tracing, an issue in a dependency's library which inherently can't be fixed on my end.

I am someone who has spent a lot of time with Sonnet 4.5 before that and was a very outspoken skeptic of agentic coding (https://news.ycombinator.com/item?id=43897320) until I gave Opus 4.5 a fair shake.

dudeinhawaii · 2026-02-12T00:16:30 1770855390

I don't know how to say this but either you haven't written any complex code or your definition of complex and impossible is not the same as mine, or you are "ai hyper booster clickbaiting" (your words).

It strains belief that anyone working on a moderate to large project would not have hit the edge cases and issues. Every other day I discover and have to fix a bug that was introduced by Claude/Codex previously (something implement just slightly incorrect or with just a slightly wrong expectation).

Every engineer I know working "mid-to-hard" problems (FANG and FANG adjacent) has broken every LLM including Opus 4.6, Gemini 3 Pro, and GPT-5.2-Codex on routine tasks. Granted the models have a very high success rate nowadays but they fail in strange ways and if you're well versed in your domain, these are easy to spot.

Granted I guess if you're just saying "build this" and using "it runs and looks fine" as the benchmark then OK.

All this is not to say Opus 4.5/6 are bad, not by a long shot, but your statement is difficult to parse as someone who's been coding a very long time and uses these agents daily. They're awesome but myopic.

minimaxir · 2026-02-12T00:40:51 1770856851

I resent your implication that I am baselessly hyping. I've open sourced a few Opus 4.5-coded projects (https://news.ycombinator.com/item?id=46543359) (https://news.ycombinator.com/item?id=46682115) that while not moderate-to-large projects, are very niche and novel without much if any prior art. The prompts I used are included with each those projects: they did not "run and look fine" on first run, and were refined just as with normal software engineering pipelines.

You might argue I'm No True Engineer because these aren't serious projects but I'd argue most successful uses of agentic coding aren't by FANG coders.

Denzel · 2026-02-12T01:15:15 1770858915

First, very cool! Thank you for sharing some actual projects with the prompts logged.

I think you and I have different definitions of “one-shotting”. If the model has to be steered, I don’t consider that a one-shot.

And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

Honestly, your experience in these repos matches my daily experience with these models almost exactly.

I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.

Dylan16807 · 2026-02-12T03:56:53 1770868613

> I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.

I'd be hesitant to use that as a way to evaluate things. Different systems run at different speeds. I want to see how much it can get done before it breaks, in different scenarios.

minimaxir · 2026-02-12T01:38:12 1770860292

I never claimed Opus 4.5 can one-shot things? Even human-written software takes a few iterations to add/polish new features as they come to mind.

> And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

That's less due to the model being wrong and more due to myself not knowing what I wanted because I am definitely not a UI/UX person. See my reply in the sibling thread.

Denzel · 2026-02-12T13:59:28 1770904768

Apologies, I may have misinterpreted the passage below from your repo:

> This crate was developed with the assistance of Claude Opus 4.5 initially to answer the shower thought "would the Braille Unicode trick work to visually simulate complex ball physics in a terminal?" Opus 4.5 one-shot the problem, so I decided to further experiment to make it more fun and colorful.

Also, yes, I don’t dispute that human written software takes iteration as well. My point is that the significance of autonomous agentic coding feels exaggerated if I’m holding the LLM’s hand more than I have to hold a senior engineer’s hand.

That doesn’t mean the tech isn’t valuable. The claims just feel over exaggerated.

minimaxir · 2026-02-12T18:00:25 1770919225

If you click the video that line links to, it one-shot the original problem as very explicitly defined as a PoC, not the entire project. The final project shipped is substantially different, and that's the difference between YOLO vibecoding and creating something useful.

There's also the embarrassing corner physics bugs present in that video, which was something that required a fix in the first few prompts.

viking123 · 2026-02-11T20:44:27 1770842667

It still cannot solve a synchronization issue in my fairly simple online game, completely wrong analysis back to back and solutions that actually make the problem worse. Most training data is probably react slop so it struggles with this type of stuff.

But I have to give it to Amodei and his goons in the media, their marketing is top notch. Fear-mongering targeted to normies about the model knowing it is being evaluated and other sort of preaching to the developers.

keybored · 2026-02-11T19:55:26 1770839726

But I used to be a skeptic but now in the last month

mwigdahl · 2026-02-11T20:29:54 1770841794

Yes, as all of modern politics illustrates, once one has staked out a position on an issue it is far more important to stick to one's guns regardless of observations rather than update based on evidence.

keybored · 2026-02-11T22:46:43 1770850003

I will change my mind on this in the next month.

tristor · 2026-02-11T23:01:21 1770850881

Not hype. Opus 4.5 is actually useful to one-shot things from detailed prompts for documentation creation, it's actually functional for generating code in a meaningful way. Unfortunately it's been nerfed, and Opus 4.6 is clearly worse from my few days of working with it since release.

Spivak · 2026-02-11T19:22:39 1770837759

The use of inflection point in the entire software industry is so annoying and cringy. It's never used correctly, it's not even used correctly in the Claude post everyone is referencing.

minimaxir · 2026-02-11T19:34:57 1770838497

What euphemism better describes the trend?

delusional · 2026-02-11T19:59:54 1770839994

If it's a trend, there's not an inflection point. The inflection point would be a point where the trend breaks.

deagle50 · 2026-02-11T20:13:57 1770840837

step function

theappsecguy · 2026-02-10T15:37:41 1770737861

I thought AI was supposed to replace Software engineers by now?

f311a · 2026-02-10T15:38:34 1770737914

It's already, that's why Microsoft abandoned GitHub

theappsecguy · 2026-02-09T16:53:55 1770656035

What if you need to deploy to production urgently...

1718627440 · 2026-02-10T10:38:04 1770719884

SSH, rsync or FTP.

theappsecguy · 2026-02-05T18:34:20 1770316460

The crash and burn can't come soon enough.

theappsecguy · 2026-02-05T17:46:09 1770313569

Having seen how clueless the new generation is and the amount of brain rot they get from using LLMs over honing their own skills, I'd say it's the opposite...

theappsecguy · 2026-02-04T17:24:31 1770225871

According to the CEO peddling AI, software engineers are about to be replaced by AI, how come AI hasn't fixed all of their atrocious software hmmm..

theappsecguy · 2026-02-03T16:34:50 1770136490

Would love if this was something capable of doing PR reviews with comment threads, etc! Super tired of having to open up Intellij to get the only usable option for that kind of flow

oug-t · 2026-02-03T17:36:09 1770140169

Great point! I personally likes gh-dash for PR management.

I will definitely try to find a integrated work flow between those two!

theappsecguy · 2026-02-03T19:57:15 1770148635

I love gh-dash, but like you said it's more in the realm of PR management rather than a fluid PR review tool!

oug-t · 2026-02-03T22:03:22 1770156202

Yes! After using gh-dash, I came up the idea to create this tool with Bubbletea to work along with gh-dahsh.

theappsecguy · 2026-01-30T13:24:34 1769779474

Exactly this. Before crypto it was Big Data, before that it was Low-code platforms

adrianwaj · 2026-02-01T00:11:48 1769904708

Before crypto, don't forget the ProgrammableWeb and all the mashups that ensued, with programmableweb.com shutting down in 2023.

2010: https://web.archive.org/web/20100226043552/http://www.progra...

Trustless Agents and the Agentic Economy is now in that cycle. Will it stick? Builders gotta build something.

2026: https://agentscan.info/

I am guessing this is sort of ProgrammableWeb 2.0.

Disintermediation is the common thread in all of this.

Will be interesting to see solutions arising for developers to monetize their open-source contributions. Pull Request = Push Demand so perhaps there should be a cost attached to that especially knowing that AI will eventually train on it.

mr-ron · 2026-01-30T13:29:14 1769779754

Was low code platforms ever hyped to a tiny fraction of crypto and ai?

csixty4 · 2026-01-30T15:06:02 1769785562

They were when we called them 4GLs.

mr-ron · 2026-01-30T15:34:32 1769787272

You mean things like Ruby, Python, Java, SQL that are ubiquitous today?

sarchertech · 2026-01-31T01:10:56 1769821856

Yeah but the hype was that the user would become the programmer and programmers wouldn’t be needed anymore.

mr-ron · 2026-01-31T01:32:58 1769823178

Well theres a lot less assembly workers, and that tech is now everywhere. Seems like exactly what will happen with AI

sarchertech · 2026-02-01T03:01:24 1769914884

In absolute numbers I’m not sure that’s true. But 4GLs aren’t what replaced assembly for anyone. C, C++ and Pascal were the most common assembly replacements.

As for C and C++, there definitely aren’t fewer of them in absolute terms. And even in relative terms they are still incredibly popular.

All of that is beside the point though. The hype around 4GLs wasn’t that they would replace older programming languages. The hype was that they’d replace programming as a profession in general. You wouldn’t need programming specialists because domain experts could handle programming the computer themselves.

This is exactly the same hype around AI coding.

pixl97 · 2026-01-30T13:57:14 1769781434

You're telling me big data went away?

1dom · 2026-01-30T14:20:27 1769782827

Right in the bin next to your DevOps.

bwfan123 · 2026-01-30T15:47:40 1769788060

There was also a mini bubble around social media aggregators and RSS feeds culminating in sites like gada.be

I see the dynamic as follows (be warned, cynical take)

1) there are the youth who are seeking approval from the community - look I have arrived - like the person building the steaming pile of browser code recently.

2) there are the veterans of a previous era who want to stay relevant in the new tech and show they still got mojo (gastown etc)

In both cases, the attitude is not one of careful deep engineering, craftsmanship or attention to the art, instead it reflects attention mongering.

theappsecguy · 2026-01-24T21:26:47 1769290007

I am yet to see this mysterious phenomenon of AI working so much faster and better than high performing teams. What I see so far, is sloppy code, poor tests and systems that do not scale in the long room.

daxfohl · 2026-01-24T21:50:17 1769291417

Yeah, hence the "when/if". Still a few model generations off, IMO. But if it happens, I think "serialization of feature development" is going to have an outsized effect on architecture, process, planning, deployment, etc.

theappsecguy · 2026-01-14T19:57:05 1768420625

If you’re only coming in at 20% above a complete junior you’re a slightly experienced junior elsewhere. You’re telling me your experience dealing with SDLC, incidents, abstractions, ubiquitous technologies like databases, CI/CD, k8s, etc only put you 20% ahead? Thats an absurd undersell.