More

GodelNumbering · 2026-02-26T07:06:22 1772089582

It is probably the first-time aha moment the author is talking about. But under the hood, it is probably not as magical as it appears to be.

Suppose you prompted the underlying LLM with "You are an expert reviewer in..." and a bunch of instructions followed by the paper. LLM knows from the training that 'expert reviewer' is an important term (skipping over and oversimplifying here) and my response should be framed as what I know an expert reviewer would write. LLMs are good at picking up (or copying) the patterns of response, but the underlying layer that evaluates things against a structural and logical understanding is missing. So, in corner cases, you get responses that are framed impressively but do not contain any meaningful inputs. This trait makes LLMs great at demos but weak at consistently finding novel interesting things.

If the above is true, the author will find after several reviews that the agent they use keeps picking up on the same/similar things (collapsed behavior that makes it good at coding type tasks) and is blind to some other obvious things it should have picked up on. This is not a criticism, many humans are often just as collapsed in their 'reasoning'.

LLMs are good at 8 out of 10 tasks, but you don't know which 8.

Kim_Bruning · 2026-02-26T07:58:43 1772092723

In your model, explain the old trick "think step by step"

GodelNumbering · 2026-02-26T17:15:36 1772126136

It simply forces the model to adopt an output style known to conduce systematic thinking without actually thinking. At no point has it through through the thing (unless there are separate thinking tokens)

GodelNumbering · 2026-02-12T12:07:16 1770898036

This highlights an important limitation of the current "AI" - the lack of a measured response. The bot decides to do something based on something the LLM saw in the training data, quickly u-turns on it (check the some hours later post https://crabby-rathbun.github.io/mjrathbun-website/blog/post...) because none of those acts are coming from an internal world-model or grounded reasoning, it is bot see, bot do.

I am sure all of us have had anecdotal experiences where you ask the agent to do something high-stakes and it starts acting haphazardly in a manner no human would ever act. This is what makes me think that the current wave of AI is task automation more than measured, appropriate reactions, perhaps because most of those happen as a mental process and are not part of training data.

_heimdall · 2026-02-12T12:13:01 1770898381

I think what your getting at is basically the idea that LLMs will never be "intelligent" in any meaningful sense of the word. They're extremely effective token prediction algorithms, and they seem to be confirming that intelligence isn't dependent solely on predicting the next token.

Lacking measured responses is much the same as lacking consistent principles or defining ones own goals. Those are all fundamentally different than predicting what comes next in a few thousand or even a million token long chain of context.

GodelNumbering · 2026-02-12T12:20:45 1770898845

Indeed. One could argue that the LLMs will keep on improving and they would be correct. But they would not improve in ways that make them a good independent agent safe for real world. Richard Sutton got a lot of disagreeing comments when he said on Dwarkesh Patel podcast that LLMs are not bitter-lesson (https://en.wikipedia.org/wiki/Bitter_lesson) pilled. I believe he is right. His argument being, any technique that relies on human generated data is bound to have limitations and issues that get harder and harder to maintain/scale over time (as opposed to bitter lesson pilled approaches that learn truly first hand from feedback)

_heimdall · 2026-02-12T12:34:25 1770899665

I disagree with Sutton that a main issue is using human generated data. We humans are trained on that and we don't run into such issues.

I expect the problem is more structural to how the LLMs, and other ML approaches, actually work. Being disembodied algorithms trying to break all knowledge down to a complex web of probabilities, and assuming that anything predicting based only on those quantified data, seems hugely limiting and at odds with how human intelligence seems to work.

GodelNumbering · 2026-02-12T12:51:40 1770900700

Sutton actually argues that we do not train on data, we train on experiences. We try things and see what works when/where and formulate views based on that. But I agree with your later point about training such a way is hugely limiting, a limit not faced by humans

co_king_3 · 2026-02-12T13:14:28 1770902068

[flagged]

_heimdall · 2026-02-12T14:20:26 1770906026

Someone arguing that LLMs will keep improving may be putting too much weight behind expecting a trend to continue, but that wouldn't make them a gullible sucker.

I'd argue that LLMs have gotten noticeably better at certain tasks every 6-12 months for the last few years. The idea that we are at the exact point where that trend stops and they get no better seems harder to believe.

gwbas1c · 2026-02-12T18:25:20 1770920720

One recent link on HN said that they double in quality every 7 months. (Kind of like Moore's Law.) I wouldn't expect that to go forever! I will admit that AI images aren't putting in 6 fingers, and AI code generation suddenly has gotten a lot better for me since I got access to Claude.

I think we're at a point where the only thing we can reliably predict is that some kind of change will happen. (And that we'll laugh at the people who behave like AI is the 2nd coming of Jesus.)

GodelNumbering · 2025-11-24T19:08:35 1764011315

The fact that the post singled out SWE-bench at the top makes the opposite impression that they probably intended.

grantpitt · 2025-11-24T19:10:21 1764011421

do say more

GodelNumbering · 2025-11-24T19:18:51 1764011931

Makes it sound like a one trick pony

jascha_eng · 2025-11-24T19:34:56 1764012896

Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.

Mkengin · 2025-11-24T23:30:09 1764027009

I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/

grantpitt · 2025-11-24T19:27:12 1764012432

well, it's a big trick

GodelNumbering · 2025-11-18T15:36:00 1763480160

And of course they hiked the API prices

Standard Context(≤ 200K tokens)

Input $2.00 vs $1.25 (Gemini 3 pro input is 60% more expensive vs 2.5)

Output $12.00 vs $10.00 (Gemini 3 pro output is 20% more expensive vs 2.5)

Long Context(> 200K tokens)

Input $4.00 vs $2.50 (same +60%)

Output $18.00 vs $15.00 (same +20%)

panarky · 2025-11-18T15:49:33 1763480973

Claude Opus is $15 input, $75 output.

xnx · 2025-11-19T00:44:09 1763513049

If the model solves your needs in fewer prompts, it costs less.

CjHuber · 2025-11-18T15:45:13 1763480713

Is it the first time long context has separate pricing? I hadn’t encountered that yet

1ucky · 2025-11-18T16:02:39 1763481759

Anthropic is also doing this for long context >= 200k Tokens on Sonnet 4.5

Topfi · 2025-11-18T15:47:48 1763480868

Google has been doing that for a while.

brianjking · 2025-11-18T15:53:21 1763481201

Google has always done this.

CjHuber · 2025-11-18T15:54:49 1763481289

Ok wow then I‘ve always overlooked that.

GodelNumbering · 2025-10-23T22:41:53 1761259313

I watch tons of long form educational content on youtube and entirely ignore shorts

GodelNumbering · 2025-10-19T19:30:57 1760902257

Also, previous comment omitted the part that now-deleted tweet from Bubeck begins with "Science revolution via AI has officially begun...".

GodelNumbering · 2025-10-15T22:17:46 1760566666

Unrelated to the paper, the most satisfying (would-be) explanation of gravity I ever learned was from AdS/CFT correspondence. Basically, it says that a Conformal Field Theory in D dimension is equivalent to a theory with gravity in D+1 dimensions. In other words, the physics that describe the 2D boundary of a anti-de Sitter space corresponds to theories that describe gravity inside that 3D space. This would click well with the holographic universe theory and entirely eliminate the problem of needing to explain gravity since it would be an emergent force

But alas, our universe is more like dS space (positive curvature), not AdS (negative curvature).

GodelNumbering · 2025-10-07T20:42:15 1759869735

You don't need to memorize

alias bat='f(){ cat "$1" | highlight --force -O xterm256 | less -SRNI; }; f'

lynndotpy · 2025-10-07T21:49:12 1759873752

For all the other issues people have pointed out, this assumes you have `highlight` installed.

yjftsjthsd-h · 2025-10-07T23:28:49 1759879729

As opposed to having bat installed?

GodelNumbering · 2025-09-25T03:59:41 1758772781

I have felt like a perennial browser refugee for a while. For about 20 years now (since OG Firefox was at peak and Chrome was not yet launched), every new browser promises the same things, gets popular enough, then does a full or partial 180.

While I like the pitch of this browser, I find it a little difficult to take it at the face value, especially given there is no info on the founders, or whether it is run as a company or a non-profit etc.

Perhaps someone in this thread could answer: which company/org structure provides best guarantees against gradual, slow, multi-year rot that seems to take over everything?

I would happily pay a small monthly subscription fee for a browser if it has strong legally protected privacy guarantees.

ashikns · 2025-09-25T04:15:43 1758773743

I feel the same. For now, I've made peace with having to switch to "whatever is the latest maintained fork with privacy defaults" every 6 months. Hopefully Ladybird becomes a usable browser sometime soon.

lionkor · 2025-09-25T11:01:54 1758798114

> which company/org structure provides best guarantees against gradual, slow, multi-year rot that seems to take over everything?

German e.V. [1]

[1]: https://en.wikipedia.org/wiki/Registered_association_(German...

novok · 2025-09-25T05:26:02 1758777962

The hard answer is the project cannot attract the good engineers anymore because it eventually stops being a growth project. Without being a growth project, you don't get investment into what you want to do anymore and there is less potential growth in your career and income.

Mattified · 2025-09-25T06:55:59 1758783359

> which company/org structure provides best guarantees against gradual, slow, multi-year rot that seems to take over everything?

I don't think any such guarantees exist, unfortunately.

satyapr93 · 2025-09-25T04:38:38 1758775118

You can look into Ulaa browser by Zoho. It fits your description.

hoistbypetard · 2025-09-25T13:39:05 1758807545

That's wild. I've been a Zoho customer for some time, and I've never heard anything about that.

EasyMark · 2025-09-25T19:11:19 1758827479

Browsers are always going to be "as-is, best effort" . No one, not even google is going to stick out their neck and protect your privacy, that's up to you, and especially not "legally" as that has aspects of easily being sued when privacy/money is involved. Certainly not for a "small monthly fee". Best you're going to get is open source and security community scrutiny of said open source code

cookiengineer · 2025-09-25T04:38:50 1758775130

In my opinion that's Ladybird at the moment.

It's hard to predict what future generations of developers are doing, but right now Ladybird seems to have the right values embedded into their nonprofit structure.

All the other browser projects have to be enshittified eventually, and therefore have to fulfill other interests than their users' interests to get there.

neya · 2025-09-25T05:20:08 1758777608

For what it's worth, I like Orion - built by the same team that built the Kagi search engine. It's a shit browser for developers (inspect panel crashes half the time and other bugs), but I trust it way more than Chrome or even Safari. For development tasks - if I need to, I simply switch to Firefox.

WadeGrimridge · 2025-09-25T09:20:09 1758792009

it’s made by the same person who made cobalt.tools

bodge5000 · 2025-09-25T09:41:14 1758793274

That makes it a lot more credible in my books, still not sure about a chromium browser but it takes it from a hard to a soft pass

extraduder_ire · 2025-09-25T12:39:37 1758803977

This makes !cobalt appearing as an example on the /bangs page make more sense.

GodelNumbering · 2025-09-05T21:22:44 1757107364

Settlement Terms (from the case pdf)

1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.

2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.

3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.

privatelypublic · 2025-09-05T21:27:08 1757107628

Don't forget: NO LEGAL PRECEDENT! which means, anybody suing has to start all over. You only settle in this scenario/point if you think you'll lose.

Edit: I'll get ratio'd for this- but its the exact same thing google did in it's lawsuit with Epic. They delayed while the public and courts focused in apple (oohh, EVIL apple)- apple lost, and google settled at a disadvantage before they had a legal judgment that couldn't be challenged latter.

gundmc · 2025-09-06T01:30:33 1757122233

I thought the courts decided against Google in Google vs Epic? It was even appealed and upheld. Are you thinking of another case? https://en.m.wikipedia.org/wiki/Epic_Games_v._Google

ignoramous · 2025-09-05T21:36:09 1757108169

Or, if you think your competition, also caught up in the same quagmire, stands to lose more by battling for longer than you did?

privatelypublic · 2025-09-05T21:46:43 1757108803

A valid touche! I still think google went with delaying tactics as public and other pressures forced Apple's case forward at greater velocity. (Edit: implicit "and then caved when apple lost"... because they're the same case)

tpmoney · 2025-09-07T01:16:26 1757207786

> You only settle in this scenario/point if you think you'll lose.

Or because you already got the judgement you wanted. Remember Athropic's training of the AI was determined to be fair use for all the legally acquired items, which Anthropic claims is their current acquisition model anyway. If we assume that's true for the sake of argument, there's no point in fighting a battle on the remaining part unless they have something to gain by it. Since they're not doing that anymore, they don't gain, and run a very high risk of losing more. From a purely PR perspective, this is the right move.

terminalshort · 2025-09-06T11:47:56 1757159276

There is already a mountain of legal precedent that you can't just download copyrighted work. That's what this lawsuit is about. Just because one of the parties is Anthropic doesn't mean this is some new AI thing.

daedrdev · 2025-09-05T23:50:34 1757116234

A full case is many more years of suits and appeals with high risks, so its natural to settle which obviously means no precedent

koolala · 2025-09-06T03:30:24 1757129424

Wont Facebook just get sued for the same thing now and maybe set precedent?

hypercube33 · 2025-09-06T06:03:04 1757138584

I thought meta had been sued and forgiven as it was impulsive that they do it to make money and faced no charge.

lukan · 2025-09-06T06:36:51 1757140611

This is what is confusing me here. I did not really follow any case, but as far as I remember meta seems to have gotten away with pirating books, but anthropic needs to pay $1.5B ?

rendaw · 2025-09-06T05:38:10 1757137090

So they can also keep models trained on the datasets? That seems pretty big too, unless the half life of models is so low it doesn't matter.

aprilthird2021 · 2025-09-06T16:23:05 1757175785

It's a separate suit being wages against Meta and OpenAI etc.

There's piracy, then there's making available a model to the public which can regurgitate copyrighted works or emulate them. The latter is still unsettled

gooosle · 2025-09-05T21:33:41 1757108021

So... it would be a lot cheaper to just buy all of the books?

gpm · 2025-09-05T22:07:45 1757110065

Yes, much.

And they actually went and did that afterwards. They just pirated them first.

dude250711 · 2025-09-05T22:59:07 1757113147

What is the HN term for this? "Bootstrapping" your start up? Or is it "growth-hacking" it?

gpm · 2025-09-05T23:01:45 1757113305

The latter (I know you're joking, but...)

Bootstrapping in the startup world refers to starting a startup using only personal resources instead of using investors. Anthropic definitely had investors.

chrisvenum · 2025-09-06T00:17:02 1757117822

Bookstrapping

rise_before_sun · 2025-09-05T23:37:24 1757115444

Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

Also, do we know if the newer models were trained without the pirated books?

gpm · 2025-09-05T23:49:42 1757116182

> Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

> Also, do we know if the newer models were trained without the pirated books?

I'm pretty sure we do but I couldn't swear to it or quickly locate a source.

rise_before_sun · 2025-09-06T01:57:02 1757123822

Thanks for the link.

Among several places where judge mentions Anthropic buying legit copies of books it pirated, probably this sentence is most relevant: "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages."

But document does not say Anthropic bought EVERY book it pirated. Other sections in the document also don't explicitly say that EVERY pirated book was later purchased.

I stopped using Claude when this case came to light. If the newer Claude models don't use pirated books, I can resume using it.

When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

gpm · 2025-09-06T02:15:41 1757124941

> But document does not say Anthropic bought EVERY book it pirated

Yeah, I wouldn't make this exact claim either. For instance it's probably safe to assume that the pirate datasets contain some books that are out of circulation and which Anthropic happened not to get a used copy of.

They did happen to get every book published by any of the lead plaintiffs though, as a point towards them probably having pretty good coverage. And it does seem to have been an attempt to purchase "all" the books for reasonable approximate definitions of "all".

> When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

I'm pretty sure pirated books were not used, but not certain, and I really don't remember when/why I formed that opinion.

eviks · 2025-09-06T05:49:27 1757137767

That might be practically impossible given the number of rights holders worldwide

privatelypublic · 2025-09-05T21:48:36 1757108916

The permission to buy them was already settled by Google Books in the 00's.

_alternator_ · 2025-09-05T21:36:45 1757108205

They did, but only after they pirated the books to begin with.

privatelypublic · 2025-09-05T21:41:18 1757108478

Few. This settlement potentially weakens all challenges to the use of copyrighted works in training LLM's. I'd be shocked if behind closed doors there wasn't some give and take on the matter between Executives/investors.

A settlement means the claimants no longer have a claim, which means if they're also part of- say, the New York Times affiliated lawsuit- they have to withdraw. A neat way of kneecapping a country wide decision that LLM training on copy written material is subject to punitive measures don't you think?

freejazz · 2025-09-05T22:11:36 1757110296

That's not even remotely true. Page 4 of the settlement describes released claims which only relate to the pirating of books. Again, the amount of misinformation and misunderstanding I see in copyright related threads here ASTOUNDS.

privatelypublic · 2025-09-05T22:25:33 1757111133

Did you miss the "also" how about "adjacent"? I won't pretend to understand the legal minutia, but reading the settlement doesn't mean you do either.

In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

freejazz · 2025-09-06T22:23:20 1757197400

I'm not sure how your confusion about what's going on is being projected to me. What about "also" what about "adjacent"?

>In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

Okay? I'm an IP litigator and you clearly have no idea what you're talking about. The only thing left to try in this case was the book library piracy. Alsup's fair use decision is just as relevant and is not mooted by the settlement and will be cited by anyone that thinks its favorable to them.

manbash · 2025-09-05T21:31:21 1757107881

Thank you. I assumed it would be quicker to find the link to the case PDF here, but your summary is appreciated!

Indeed, it is not only payout, but the destruction of the datasets. Although the article does quote:

> “Anthropic says it did not even use these pirated works,” he said. “If some other generative A.I. company took data from pirated source and used it to train on and commercialized it, the potential liability is enormous. It will shake the industry — no doubt in my mind.”

Even if true, I wonder how many cases we will see in the near future.

pier25 · 2025-09-05T23:11:36 1757113896

Only 500,000 copyrighted works?

I was under the impression they had downloaded millions of books.

KittenInABox · 2025-09-06T03:09:34 1757128174

Individual authors had to join the class action lawsuit, sadly. They were not all automatically registered for each violation.

testing22321 · 2025-09-05T22:10:01 1757110201

I’m an author, can I get in on this?

A_D_E_P_T · 2025-09-05T22:18:26 1757110706

I had the same question.

It looks like you'll be able to search this site if the settlement is approved:

> https://www.anthropiccopyrightsettlement.com/

If your work is there, you qualify for a slice of the settlement. If not, you're outta luck.

mooreds · 2025-09-06T13:15:33 1757164533

I didn't see a way to search for my book there, but there definitely is an author intake form.

This site references Meta, but the training corpus probably has some overlap? Maybe?

https://www.theatlantic.com/technology/archive/2025/03/searc...

__mharrison__ · 2025-09-06T07:40:32 1757144432

I'm an author. Can I get anthropic stock instead?