More

minimaxir · 2026-02-10T19:51:46 1770753106

You are selection biasing towards the most extreme cases of AI absurdity.

> GPT-5.3-Codex and Opus 2.6 were released. Reviewers note they're struggling to find tasks the previous versions couldn't handle. The improvements are incremental at best.

I have not seen any claims of this other than Opus 4.6 being weirdly token-hungry.

minimaxir · 2026-02-09T01:19:22 1770599962

More recent models are better at reading and obeying constraints in AGENTS.md/CLAUDE.md.

GPT-5.2-Codex did a bad job of obeying my more detailed AGENTS.md files but GPT-5.3-Codex very evidently follows it well.

hyperadvanced · 2026-02-09T01:38:49 1770601129

Perhaps I’m not using the latest and greatest in terms of models. I tend to avoid using tools that require excessive customization like this.

I find it infinitely frustrating to attempt to make these piece of shit “agents” do basic things like running the unit/integrations tests after making changes.

00deadbeef · 2026-02-09T06:06:53 1770617213

Opus 4.5 successfully ignored the first line of my CLAUDE.md file last week

hyperadvanced · 2026-02-10T02:08:35 1770689315

Thank god it’s not just me. It really makes me feel insane reading some of the commentary online.

minimaxir · 2026-02-07T19:28:04 1770492484

After exceeding the increasingly shrinking session limit with Opus 4.6, I continued with the extra usage only for a few minutes and it consumed about $10 of the credit.

I can't imagine how quickly this Fast Mode goes through credit.

minimaxir · 2026-02-05T19:35:13 1770320113

A symptom of the increasing backlash against generative AI (both in creative industries and in coding) is that any flaw in the resulting product is predicate to call it AI slop, even if it's very explicitly upfront that it's an experimental demo/proof of concept and not the NEXT BIG THING being hyped by influencers. That nuance is dead even outside of social media.

stonogo · 2026-02-05T19:46:59 1770320819

AI companies set that expectation when their CEOs ran around telling anyone who would listen that their product is a generational paradigm shift that will completely restructure both labor markets and human cognition itself. There is no nuance in their own PR, so why should they benefit from any when their product can't meet those expectations?

minimaxir · 2026-02-05T19:53:40 1770321220

Because it leads to poor and nonconstructive discourse that doesn't educate anyone about the implications of the tech, which is expected on social media but has annoyingly leaked to Hacker News.

There's been more than enough drive-by comments from new accounts/green names even in this HN submission alone.

krupan · 2026-02-05T21:15:16 1770326116

It does lead to poor non-constructive discourse. That's why we keep calling those CEOs to task on it. Why are you not?

dwaltrip · 2026-02-05T21:31:51 1770327111

The CEOs aren't here in the comments.

LinXitoW · 2026-02-06T03:31:55 1770348715

Which is why we ought to always bring up their BS every time people try to pretend it didn't happen.

The promises made are ABSOLUTELY relevant to how promising or not these experiments are.

pertymcpert · 2026-02-06T06:58:51 1770361131

I bet you get upset when you buy a new iPhone and don't love it, because Tim Cook said on the ad that they think you're going to love it.

emp17344 · 2026-02-06T15:21:11 1770391271

It cannot be overstated how absurd the marketing campaign for AI was. OpenAI and Anthropic have convinced half the world that AI is going to become a literal god. They deserve to eat a lot of shit for those outright lies.

amlib · 2026-02-06T01:13:37 1770340417

It's not just social media, it's IRL too.

Maybe the general population will be willing to have a more constructive discussions about this tech once the trillion dollar companies stop pillaging everything they see in front of them and cease acting like sociopaths whose only objectives seem to be concentrating power, generating dissidence and harvesting wealth.

minimaxir · 2026-02-05T19:24:05 1770319445

There is an entire Evaluation section that addresses that criticism (both in agreement and disagreement).

minimaxir · 2026-02-05T18:18:56 1770315536

Only if you go above 200k, which is a) standard with other model providers and b) intuitive as compute scales with context length.

minimaxir · 2026-02-05T18:10:33 1770315033

I remember when AI labs coordinated so they didn't push major announcements on the same day to avoid cannibalizing each other. Now we have AI labs pushing major announcements within 30 minutes.

observationist · 2026-02-05T18:44:47 1770317087

The labs have fully embraced the cutthroat competition, the arms race has fully shed the civilized facade of beneficient mutual cooperation.

Dirty tricks and underhanded tactics will happen - I think Demis isn't savvy in this domain, but might end up stomping out the competition on pure performance.

Elon, Sam, and Dario know how to fight ugly and do the nasty political boardroom crap. 26 is gonna be a very dramatic year, lots of cinematic potential for the eventual AI biopics.

manquer · 2026-02-05T18:53:00 1770317580

>civilized facade of mutual cooperation

>Dirty tricks and underhanded tactics

As long the tactics are legal ( i.e. not corporate espionage, bribes etc), the no holds barred full free market competition is the best thing for the market and the consumers.

ajam1507 · 2026-02-05T21:08:15 1770325695

> As long the tactics are legal ( i.e. not corporate espionage, bribes etc), the no holds barred full free market competition is the best thing for the market and the consumers.

The implicit assumption here is that we have constructed our laws so skillfully that the only path to win a free market competition is by producing a better product, or that all efforts will be spent doing so. This is never the case. It should be self-evident from this that there is a more productive way for companies to compete and our laws are not sufficient to create the conditions.

thethimble · 2026-02-05T19:14:30 1770318870

The consumers are getting huge wins.

Model costs continue to collapse while capability improves.

Competition is fantastic.

mrandish · 2026-02-05T20:29:30 1770323370

> The consumers are getting huge wins.

However, the investors currently subsidizing those wins to below cost may be getting huge losses.

credit_guy · 2026-02-06T04:09:32 1770350972

Yes, but that's the nature of the game, and they know it.

doom2 · 2026-02-06T01:27:29 1770341249

> Model costs continue to collapse

And yet RAM prices are still sky high. Game consoles are getting more expensive, not cheaper, as a result. When will competition benefit those consumers? Or consumers of desktop RAM?

nasreddin · 2026-02-06T02:37:46 1770345466

The free market has simply decided these consumers are not as relevant as the others.

niek_pas · 2026-02-06T09:50:01 1770371401

Maybe the free market is wrong.

anomaly_ · 2026-02-06T10:04:49 1770372289

It can’t be. Those uses are suboptimal, hence the users aren’t willing to pay the new prices.

acjohnson55 · 2026-02-06T11:01:15 1770375675

Not really. Investors with hundreds of billions of dollars have decided it. The process by which capital has been allocated the way it has isn't some mathematically natural or optimal thing. Our market is far from free.

ETH_start · 2026-02-06T16:15:11 1770394511

Saying "investors with hundreds of billions decided it" makes it sound like a few people just chose the outcome, when in reality prices and capital move because millions of consumers, companies, workers, and smaller investors keep making choices every day. Big investors only make money if their decisions match what people actually want; they can't just command success. If they guess wrong, others profit by allocating money better, so having influence isn't the same as having control.

The system isn't mathematically perfect, but that doesn't make it arbitrary. It works through an evolutionary process: bad bets lose money, better ones gain more resources.

Any claim that the outcome is suboptimal only really means something if the claimant can point to a specific alternative that would reliably do better under the same conditions. Otherwise critics are mostly just expressing personal frustration with the outcome.

dwaltrip · 2026-02-05T21:18:06 1770326286

Sure, it can be beneficial. But don't forget that externalities are a thing.

wiz21c · 2026-02-05T22:07:17 1770329237

in the short term maybe, in the long term it depends how many winners you have. If only two, the market will be a duopoly. Customers will get better AI but will have zero power over the way the AI is produced or consumed (i.e. cO2 emission, ethics, etc will be burnt)

manquer · 2026-02-06T00:15:53 1770336953

> how many winners ... duopoly

There aren't any insurmountable large moats, plenty of open weight models that perform close enough.

> CO₂ emissions

Different industry that could also benefit from more competition ? Clean(er) energy is not even more expensive than dirty sources on pure $/kWh, we still do need dirty sources for workloads like base demand, peakers etc that the cheap clean sources cannot service today.

KoolKat23 · 2026-02-05T23:22:53 1770333773

Yes, but not cutthroat competition that implies unsustainable, detrimental competition that kills off the industry.

webdevver · 2026-02-06T21:02:09 1770411729

demis is more than qualified. he was politically savvy enough to stay in control of deepmind when it was acquired by google, thats up there with the altman fiasco a year or two ago.

zozbot234 · 2026-02-05T18:17:59 1770315479

They're also coordinating around Chinese New Year to compete with new releases of the major open/local models.

DonHopkins · 2026-02-05T18:25:21 1770315921

Year of the Pelican!

hoeoek · 2026-02-05T18:43:08 1770316988

simonw?

tedsanders · 2026-02-05T18:45:02 1770317102

This goes way back. When OpenAI launched GPT-4 in 2023, both Anthropic and Google lined up counter launches (Claude and Magic Wand) right before OpenAI's standard 10am launch time.

crorella · 2026-02-05T18:13:50 1770315230

The thrill of competition

manquer · 2026-02-05T18:50:57 1770317457

Wouldn't that be illegal ? i.e. cartel to collude like that ?

avaer · 2026-02-06T00:21:04 1770337264

You were downvoted but I don't understand why. This is the purpose/spirit of antitrust law [1]

[1] https://en.wikipedia.org/wiki/United_States_antitrust_law

manquer · 2026-02-06T01:20:36 1770340836

I have long since given up trying to understand voting patterns in HN :)

---

Sadly it was the core of anti-trust law, since 1970s things have changed.

The predominant view today (i.e. Chicago School view) in both judiciary and executive are influenced by Justice Bork's ideas that consumer benefit being the deciding factor over company's actions.

Consumer benefits becomes opinions of projections by either side of a case about the future, whereas company actions like collusion, pricing fixing or M&A are hard facts with strong evidence. Today it is all vibes on how the courts (or executive) feel .

So now we have Government sanctioned cartels like in Aviation Alliances [1] that is basically based on convoluted catch-22-esque reasoning because it favors strategic goals even though it would be a violation of the letter/spirit of the law.

[1] https://www.transportation.gov/office-policy/aviation-policy...

sathish316 · 2026-02-06T11:01:19 1770375679

AI commoditized to the point where Nash Equilibrium and 35 minutes between announcements matter :)

cedws · 2026-02-05T18:54:05 1770317645

I wish they’d just stop pretending to care about safety, other than a few researchers at the top they care about safety only as long as they aren’t losing ground to the competition. Game theory guarantees the AI labs will do what it takes to ensure survival. Only regulation can enforce the limits, self policing won’t work when money is involved.

thethimble · 2026-02-05T19:13:18 1770318798

As long as China continues to blitz forward, regulation is a direct path to losing.

cedws · 2026-02-05T20:07:26 1770322046

Define "losing."

Europe is prematurely regarded as having lost the AI race. And yet a large portion of Europe live higher quality lives compared to their American counterparts, live longer, and don't have to worry about an elected orange unleashing brutality on them.

thethimble · 2026-02-05T20:24:38 1770323078

If the world is built on AI infrastructure (models, compute, etc.) that is controlled by the CCP then the west has effectively lost.

This may lead to better life outcomes, but if the west doesn't control the whole stack then they have lost their sovereignty.

This is already playing out today as Europe is dependent on the US for critical tech infrastructure (cloud, mail, messaging, social media, AI, etc). There's no home grown European alternatives because Europe has failed to create an economic environment to assure its technical sovereignty.

fakedang · 2026-02-05T20:53:37 1770324817

Europe has already lost the tech race - their cloud systems that their entire welfare states rely upon are all hosted on servers hosted by American private companies, which can turn them off with a flick of a switch if and when needed.

When the welfare state, enabled by technology, falls apart, it won't take long for European society to fall apart. Except France maybe.

clows · 2026-02-06T13:52:59 1770385979

welfare state enabled by cloud services/technology?

I'm not sure if you know less about europe or tech.

> Except France maybe.

sure

pixl97 · 2026-02-05T19:33:08 1770319988

You mean all paths are direct paths to losing.

vovavili · 2026-02-05T20:02:31 1770321751

The last thing I would want is for excessively neurotic bureaucrats to interfere with all the mind-blowing progress we've had in the last couple of years with LLM technology.

minimaxir · 2026-02-05T17:54:05 1770314045

Will Opus 4.6 via Claude Code be able to access the 1M context limit? The cost increase by going above 200k tokens is 2x input, 1.5x output, which is likely worth it especially for people with the $100/$200 plans.

CryptoBanker · 2026-02-05T18:03:24 1770314604

The 1M context is not available via subscription - only via API usage

romanovcode · 2026-02-05T18:09:15 1770314955

Well this is extremely disappointing to say the least.

ayhanfuat · 2026-02-05T18:18:02 1770315482

It says "subscription users do not have access to Opus 4.6 1M context at launch" so they are probably planning to roll it out to subscription users too.

kimixa · 2026-02-05T19:36:11 1770320171

Man I hope so - the context limit is hit really quickly in many of my use cases - and a compaction event inevitably means another round of corrections and fixes to the current task.

Though I'm wary about that being a magic bullet fix - already it can be pretty "selective" in what it actually seems to take into account documentation wise as the existing 200k context fills.

humanfromearth9 · 2026-02-05T20:45:48 1770324348

Hello,

I check context use percentage, and above ~70% I ask it to generate a prompt for continuation in a new chat session to avoid compaction.

It works fine, and saves me from using precious tokens for context compaction.

Maybe you should try it.

pluralmonad · 2026-02-05T21:03:52 1770325432

How is generating a continuation prompt materially different from compaction? Do you manually scrutinize the context handoff prompt? I've done that before but if not I do not see how it is very different from compaction.

robertfw · 2026-02-06T00:13:04 1770336784

I wonder if it's just: compact earlier, so there's less to compact, and more remaining context that can be used to create a more effective continuation

nickstinemates · 2026-02-05T19:42:27 1770320547

Is this a case of doing it wrong, or you think accuracy is good enough with the amount of context you need to stuff it with often?

romanovcode · 2026-02-05T20:06:53 1770322013

In my example the Figma MCP takes ~300k per medium sized section of the page and it would be cool to enable it reading it and implementing Figma designs straight. Currently I have to split it which makes it annoying.

kimixa · 2026-02-05T19:51:50 1770321110

I mean the systems I work on have enough weird custom APIs and internal interfaces just getting them working seems to take a good chunk of the context. I've spent a long time trying to minimize every input document where I can, compact and terse references, and still keep hitting similar issues.

At this point I just think the "success" of many AI coding agents is extremely sector dependent.

Going forward I'd love to experiment with seeing if that's actually the problem, or just an easy explanation of failure. I'd like to play with more controls on context management than "slightly better models" - like being able to select/minimize/compact sections of context I feel would be relevant for the immediate task, to what "depth" of needed details, and those that aren't likely to be relevant so can be removed from consideration. Perhaps each chunk can be cached to save processing power. Who knows.

IhateAI_2 · 2026-02-05T21:10:33 1770325833

lmao what are you building that actually justify needing 1mm tokens on a task? People are spending all this money to do magic tricks on themselves.

kimixa · 2026-02-05T21:43:50 1770327830

The opus context window is 200k tokens not 1mm.

But I kinda see your point - assuming from you're name you're not just a single purpose troll - I'm still not sold on the cost effectiveness of the current generation, and can't see a clear and obvious change to that for the next generation - especially as they're still loss leaders. Only if you play silly games like "ignoring the training costs" - IE the majority of the costs - do you get even close to the current subscription costs being sufficient.

My personal experience is that AI generally doesn't actually do what it is being sold for right now, at least in the contexts I'm involved with. Especially by somewhat breathless comments on the internet - like why are they even trying to persuade me in the first place? If they don't want to sell me anything, just shut up and keep the advantage for yourselves rather than replying with the 500th "You're Holding It Wrong" comment with no actionable suggestions. But I still want to know, and am willing to put the time, effort and $$$ in to ensure I'm not deluding myself in ignoring real benefits.

FrostKiwi · 2026-02-06T04:19:20 1770351560

I do not trust that, similar working was used when Sonnet 1M launched. Still not the case today.

IhateAI_2 · 2026-02-05T21:09:31 1770325771

They want the value of your labor and competency to be 1:1 correlated to the quality and quantity of tokens you can afford (or be loaned)??

Its a weapon who's target is the working class. How does no one realize this yet?

Don't give them money, code it yourself, you might be surprised how much quality work you can get done!

minimaxir · 2026-02-04T19:09:19 1770232159

You’re not Michael Scott, you can’t just declare recession.

ewuhic · 2026-02-04T19:16:14 1770232574

Thanks for your kind words, minimaxir. (Also appreciate what you do in general).

minimaxir · 2026-02-03T18:10:53 1770142253

[deleted]

ianhawes · 2026-02-03T18:13:34 1770142414

> The Claude Agent SDK is a collection of tools that helps developers build powerful agents on top of Claude Code.

https://claude.com/blog/building-agents-with-the-claude-agen...

From September 29, 2025