More

ck_one · 2026-02-14T02:52:53 1771037573

Do you plan on Open Sourcing it? A bit scary to just execute a random binary and put in a bunch of API keys.

fabienpenso · 2026-02-14T03:42:38 1771040558

It is already MIT license.

ck_one · 2026-02-14T03:48:56 1771040936

Didn't see it, thanks!

ck_one · 2026-02-06T22:47:31 1770418051

How do you think big labs will try to attack you guys? Why not just close the data to them?

ck_one · 2026-02-05T21:38:32 1770327512

Just tested the new Opus 4.6 (1M context) on a fun needle-in-a-haystack challenge: finding every spell in all Harry Potter books.

All 7 books come to ~1.75M tokens, so they don't quite fit yet. (At this rate of progress, mid-April should do it ) For now you can fit the first 4 books (~733K tokens).

Results: Opus 4.6 found 49 out of 50 officially documented spells across those 4 books. The only miss was "Slugulus Eructo" (a vomiting spell).

Freaking impressive!

grey-area · 2026-02-06T09:52:31 1770371551

Surely the corpus Opus 4.6 ingested would include whatever reference you used to check the spells were there. I mean, there are probably dozens of pages on the internet like this:

https://www.wizardemporium.com/blog/complete-list-of-harry-p...

Why is this impressive?

Do you think it's actually ingesting the books and only using those as a reference? Is that how LLMs work at all? It seems more likely it's predicting these spell names from all the other references it has found on the internet, including lists of spells.

sigmoid10 · 2026-02-06T09:58:40 1770371920

Most people still don't realize that general public world knowledge is not really a test for a model that was trained on general public world knowledge. I wouldn't be surprised if even proprietary content like the books themselves found their way into the training data, despite what publishers and authors may think of that. As a matter of fact, with all the special deals these companies make with publishers, it is getting harder and harder for normal users to come up with validation data that only they have seen. At least for human written text, this kind of data is more or less reserved for specialist industries and higher academia by now. If you're a janitor with a high school diploma, there may be barely any textual information or fact you have ever consumed that such a model hasn't seen during training already.

rendx · 2026-02-06T13:37:42 1770385062

> I wouldn't be surprised if even proprietary content like the books themselves found their way into the training data

No need for surprises! It is publicly known that the corpus of 'shadow libraries' such as Library Genesis and Anna's Archive were specifically and manually requested by at least NVIDIA for their training data [1], used by Google in their training [2], downloaded by Meta employees [3] etc.

[1] https://news.ycombinator.com/item?id=46572846

[2] https://www.theguardian.com/technology/2023/apr/20/fresh-con...

[3] https://www.theverge.com/2023/7/9/23788741/sarah-silverman-o...

paodealho · 2026-02-06T13:49:54 1770385794

also:

"Researchers Extract Nearly Entire Harry Potter Book From Commercial LLMs"

https://www.aitechsuite.com/ai-news/ai-shock-researchers-ext...

sigmoid10 · 2026-02-06T13:57:42 1770386262

The big AI houses are all in involved in varying degrees of litigation (all the way to class action lawsuits) with the big publishing houses. I think they at least have some level of filtering for their training data to keep them legally somewhat compliant. But considering how much copyrighted stuff is spread blisfully online, it is probably not enough to filter out the actual ebooks of certain publishers.

rendx · 2026-02-06T23:06:02 1770419162

> I think they at least have some level of filtering for their training data to keep them legally somewhat compliant.

So far, courts are siding with the "fair use" argument. No need to exclude any data.

https://natlawreview.com/article/anthropic-and-meta-fair-use...

"Even if LLM training is fair use, AI companies face potential liability for unauthorized copying and distribution. The extent of that liability and any damages remain unresolved."

https://www.whitecase.com/insight-alert/two-california-distr...

joenot443 · 2026-02-06T13:26:27 1770384387

> even proprietary content like the books themselves

This definitely raises an interesting question. It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files. Immediately to mind was House of Leaves, Infinite Jest, Harry Potter, basically any Stephen King book - they've all been posted at some point.

Do LLMS have a good way of inferring where knowledge from the context begins and knowledge from the training data ends?

rendx · 2026-02-06T13:42:25 1770385345

> It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files

Anna's Archive alone claims to currently publicly host 61,654,285 books, more than 1PB in total.

yunohn · 2026-02-06T15:16:47 1770391007

Maybe y’all missed this?

https://www.washingtonpost.com/technology/2026/01/27/anthrop...

Anthropic, specifically, ingested libraries of books by scanning and then disposing of them.

beepbooptheory · 2026-02-06T16:30:20 1770395420

> If you're a janitor with a high school diploma, there may be barely any textual information or fact you have ever consumed that such a model hasn't seen during training already.

The plot of Good Will Hunting would like a word.

MarcellusDrum · 2026-02-06T10:03:36 1770372216

So a good test would be replacing the spell names in the books with made-up spells. And if a "real" spell name was given, it also tests whether it "cheated".

ggrab · 2026-02-10T23:03:12 1770764592

I've run that experiment now, spoiler: It cheated with its pre-training knowledge https://georggrab.net/content/opus46retrieval.html

MarcellusDrum · 2026-02-14T10:19:08 1771064348

Thanks for trying! Good to know.

outofpaper · 2026-02-06T10:38:07 1770374287

A real test is synthesizing 100,000 sentences of this slect random ones and then inject the traits you want thr LLM to detect and describe, eg have a set of words or phrases that may represent spells and have them used so that they do something. Then have the LLM find these random spells in the random corpus.

lxgr · 2026-02-06T12:06:46 1770379606

It could still remember where each spell is mentioned. I think the only way to properly test this would be to run it against an unpublished manuscript.

staticman2 · 2026-02-06T12:54:59 1770382499

Any obscure work of fiction or fanfiction would likely be fine as a casual test.

If you ask a model to discuss an obscure work it'll have no clue what it's about.

This is very different than asking about Harry Potter.

lxgr · 2026-02-06T13:18:41 1770383921

Yeah, that's what I've been doing as well, and at least Gemini 3 Pro did not fare very well.

staticman2 · 2026-02-06T13:29:13 1770384553

For fun I've asked Gemini Pro to answer open ended questions about obscure books like "Read this novel and tell me what the hell is this book, do a deep reading and analyze" and I've gotten insightful/ enjoyable answers but I've never asked it to make lists of spells or anything like that.

vercaemert · 2026-02-06T11:10:29 1770376229

It's impressive, even if the books and the posts you're talking about were both key parts of the training data.

There are many academic domains where the research portion of a PhD is essentially what the model just did. For example, PhD students in some of the humanities will spend years combing ancient sources for specific combinations of prepositions and objects, only to write a paper showing that the previous scholars were wrong (and that a particular preposition has examples of being used with people rather than places).

This sort of experiment shows that Opus would be good at that. I'm assuming it's trivial for the OP to extend their experiment to determine how many times "wingardium leviosa" was used on an object rather than a person.

(It's worth noting that other models are decent at this, and you would need to find a way to benchmark between them.)

adastra22 · 2026-02-06T11:21:09 1770376869

I don’t think this example proves your point. There’s no indication that the model actually worked this out from the input context, instead of regurgitating it from the training weights. A better test would be to subtly modify the books fed in as input to the model so that there was actually 51 spells, and see if it pulls out the extra spell, or to modify the names of some spells, etc.

In your example, it might be the case that the model simply spits out consensus view, rather than actually finding/constructing this information on his own.

vercaemert · 2026-02-06T11:38:02 1770377882

Ah, that's a good point.

fastasucan · 2026-02-07T20:09:03 1770494943

Since it got 49 of 50 right its worse than what you would get using a simple google search. People would immediately disregard a conventional source that only listed 49 out of 50.

ehatr · 2026-02-06T13:55:25 1770386125

The poster you reply to works in AI. The marketing strategy is to always have a cute Pelican or Harry Potter comment as the top comment for positive associations.

The poster knows all of that, this is plain marketing.

throw10920 · 2026-02-06T14:11:38 1770387098

This sounds compelling, but also something that an armchair marketer would have theorycrafted without any real-world experience or evidence that it actually works - and I searched online and can't find any references to something like it.

Do you have a citation for this?

rlt · 2026-02-06T21:13:40 1770412420

They should try the same thing but replace the original spell names with something else.

zaphirplane · 2026-02-06T14:01:24 1770386484

Why doesn’t you ask it and find out ;)

grey-area · 2026-02-06T14:18:15 1770387495

Because the model doesn't know but will happily tell a convincing lie about how it works.

hereonout2 · 2026-02-06T09:24:12 1770369852

I was playing about with Chat GPT the other day, uploading screen shots of sheet music and asking it to convert it to ABC notation so I could make a midi file of it.

The results seemed impressive until I noticed some of the "Thinking" statements in the UI.

One made it apparent the model / agent / whatever had read the title from the screenshot and was off searching for existing ABC transcripts of the piece Ode to Joy.

So the whole thing was far less impressive after that, it wasn't reading the score anymore, just reading the title and using the internet to answer my query.

nobodywillobsrv · 2026-02-06T09:26:38 1770369998

Yes I have found that grok for example actually suddenly becomes quite sane when you tell it to stop querying the internet And just rethink the conversation data and answer the question.

It's weird, it's like many agents are now in a phase of constantly getting more information and never just thinking with what they've got.

Szpadel · 2026-02-06T09:59:07 1770371947

but isn't it what we wanted? we complained so much that LLM uses deprecated or outdated apis instead of current version because they relied so much on what they remembered

nobodywillobsrv · 2026-02-13T14:24:24 1770992664

To be clear, what I mean is that grok will query 30 pages and then answer your question vaguely or wrongly and then ask for clarification of what it meant and then it goes and requeries everything again ... I can imagine why it might need to revisit pages etc and it might be a UI thing but it still feels like until you yell at it to stop searching for answers to summarise it doesn't activate it's "think with what you got" mode.

I guess we could call this gathering and then do your best conditional on what you found right now.

HappMacDonald · 2026-02-07T19:09:27 1770491367

2010's: Google Search is making humans who constantly rely on it dumber

2020's: LLMs are making humans who constantly rely on them dumber

2026: Google Search is making LLMs who constantly rely on it dumber

bestham · 2026-02-06T09:35:46 1770370546

Touché, that is what we humans are doing to some degree as well.

anomaly_ · 2026-02-06T09:59:44 1770371984

Sounds pretty human like! Always searching for a shortcut

lpcvoid · 2026-02-06T10:42:37 1770374557

It sounds like it's lying and making stuff up, something everybody seems to be okay with when using LLMs.

LeanderK · 2026-02-06T10:59:13 1770375553

I am not sure why...you want the LLM to solve problems not come up with answers itself. It's allowed to use tools, precisely because it tends to make stuff up. In general, only if you're benchmarking LLMs you care about whether the LLM itself provided the answer or it used a tool. If you ask it to convert the notation of sheet music it might use a tool, and it's probably the right decision.

cherrycherry98 · 2026-02-06T14:42:54 1770388974

The shortcut is fine if it's a bog standard canonical arrangement of the piece. If it's a custom jazz rendition you composed with an odd key changes and and shifting time signatures, taking that shortcut is not going to yield the intended result. It's choosing the wrong tool to help which makes it unreliable for this task.

kouunji · 2026-02-06T15:16:26 1770390986

For structured outputs like that wouldn’t it be better to get the LLM to create a script to repeatably make the translation?

xiomrze · 2026-02-05T22:06:41 1770329201

Honest question, how do you know if it's pulling from context vs from memory?

If I use Opus 4.6 with Extended Thinking (Web Search disabled, no books attached), it answers with 130 spells.

ozim · 2026-02-05T22:24:40 1770330280

Exactly there was this study where they were trying to make LLM reproduce HP book word for word like giving first sentences and letting it cook.

Basically they managed with some tricks make 99% word for word - tricks were needed to bypass security measures that are there in place for exactly reason to stop people to retrieve training material.

pron · 2026-02-05T23:06:11 1770332771

This reminds me of https://en.wikipedia.org/wiki/Pierre_Menard,_Author_of_the_Q... :

> Borges's "review" describes Menard's efforts to go beyond a mere "translation" of Don Quixote by immersing himself so thoroughly in the work as to be able to actually "re-create" it, line for line, in the original 17th-century Spanish. Thus, Pierre Menard is often used to raise questions and discussion about the nature of authorship, appropriation, and interpretation.

ck_one · 2026-02-05T22:35:00 1770330900

Do you remember how to get around those tricks?

djhn · 2026-02-05T22:53:43 1770332023

This is the paper: https://arxiv.org/abs/2601.02671

Grok and Deepmind IIRC didn’t require tricks.

eek2121 · 2026-02-05T23:13:19 1770333199

This really makes me want to try something similar with content from my own website.

I shut it down a while ago because the number of bots overtake traffic. The site had quite a bit of human traffic (enough to bring in a few hundred bucks a month in ad revenue, and a few hundred more in subscription revenue), however, the AI scrapers really started ramping up and the only way I could realistically continue would be to pay a lot more for hosting/infrastructure.

I had put a ton of time into building out content...thousands of hours, only to have scrapers ignore robots, bypass cloudflare (they didn't have any AI products at the time), and overwhelm my measly infrastructure.

Even now, with the domain pointed at NOTHING, it gets almost 100,000 hits a month. There is NO SERVER on the other end. It is a dead link. The stats come from Cloudflare, where the domain name is hosted.

I'm curious if there are any lawyers who'd be willing to take someone like me on contingency for a large copyright lawsuit.

apsurd · 2026-02-06T05:42:34 1770356554

Can we help get your infra cost down to negligible? I'm thinking things like pre-generated static pages and CDNs. I won't assume you hadn't thought of this before, but I'd like to understand more where your non-trivial infra cost come from?

djhn · 2026-02-06T06:39:52 1770359992

I would be tempted to try and optimise this as well. 100000 hits on an empty domain and ~200 dollars worth of bot traffic sounds wild. Are they using JS-enabled browsers or sim farms that download and re-download images and videos as well?

londons_explore · 2026-02-06T10:05:15 1770372315

> only to have scrapers ignore robots, bypass cloudflare

Set the server to require cloudflares SSL client cert, so nobody can connect to it directly.

Then make sure every page is cacheable and your costs will drop to near zero instantly.

It's like 20 mins to set these things up.

raphman · 2026-02-06T09:41:44 1770370904

a) As an outside observer, I would find such a lawsuit very interesting/valuable. But I guess the financial risk of taking on OpenAI or Anthropic is quite high.

b) If you don't want bots scraping your content and DDOSing you, there are self-hosted alternatives to Cloudflare. The simplest one that I found is https://github.com/splitbrain/botcheck - visitors just need to press a button and get a cookie that lets them through to the website. No proof-of-work or smart heuristics.

camdenreslink · 2026-02-06T00:14:08 1770336848

The new cloudflare products for blocking bots and AI scrapers might be worth a shot if you put so much work into the content.

prawn · 2026-02-06T13:43:56 1770385436

Further, some low effort bots can be quickly handled with CF by blocking specific countries (e.g., Brazil and Russia, for one of my sites).

lobsterthief · 2026-02-07T16:59:59 1770483599

I work for a publisher that serves the Chinese market as a secondary market. Sucks that we can’t blanketly do this since we get hammered by Chinese bots daily. We also have an extremely old codebase (Drupal) which makes blanket caching difficult. Working to migrate from Cloudfront to Cloudflare at least

WarmWash · 2026-02-06T16:05:57 1770393957

What's not clear from the study (at least skimming it) is if they always started the ball rolling with ground truth passages or if they chained outputs from the model until they got to the end of the book. I strongly suspect the latter would hopelessly corrupt relatively quickly.

It seems like this technique only works if you have a copy of the material to work off of, i.e. enter a ground truth passage, tell the model to continue it as long as it can, and then enter the next ground truth passage to continue in the next session.

djhn · 2026-02-07T12:13:26 1770466406

Oh! That’s a huge caveat if that’s indeed the case.

petercooper · 2026-02-05T22:18:20 1770329900

One possible trick could be to search and replace them all with nonsense alternatives then see if it extracts those.

andai · 2026-02-05T22:29:47 1770330587

That might actually boost performance since attention pays attention to stuff that stands out. If I make a typo, the models often hyperfixate on it.

jazzyjackson · 2026-02-06T03:48:57 1770349737

A fine instruction following task but if harry potter is in the weights of the neural net, it's going to mix some of the real ones with the alternates.

ck_one · 2026-02-05T22:34:11 1770330851

When I tried it without web search so only internal knowledge it missed ~15 spells.

clanker_fluffer · 2026-02-05T22:16:36 1770329796

What was your prompt?

golfer · 2026-02-05T22:41:13 1770331273

There's lots of websites that list the spells. It's well documented. Could Claude simply be regurgitating knowledge from the web? Example:

https://harrypotter.fandom.com/wiki/List_of_spells

ck_one · 2026-02-05T22:44:33 1770331473

It didn't use web search. But for sure it has some internal knowledge already. It's not a perfect needle in the hay stack problem but gemini flash was much worse when I tested it last time.

viraptor · 2026-02-05T22:57:55 1770332275

If you want to really test this, search/replace the names with your own random ones and see if it lists those.

Otherwise, LLMs have most of the books memorised anyway: https://arstechnica.com/features/2025/06/study-metas-llama-3...

jazzyjackson · 2026-02-06T03:45:50 1770349550

Being that it has the books memorized (huh, just learned another US/UK spelling quirk), I would suppose feeding it the books with altered spells would get you a confused mishmash of data in the context and data in the weights.

ribosometronome · 2026-02-06T00:01:45 1770336105

Couldn't you just ask the LLM which 50 (or 49) spells appear in the first four Harry Potter books without the data for comparison?

viraptor · 2026-02-06T00:10:33 1770336633

It's not going to be as consistent. It may get bored of listing them (you know how you can ask for many examples and get 10 in response?), or omit some minor ones for other reasons.

By replacing the names with something unique, you'll get much more certainty.

Grimblewald · 2026-02-06T00:29:27 1770337767

might not work well, but by navigating to a very harry potter dominant part of latent space by preconditioning on the books you make it more likely to get good results. An example would be taking a base model and prompting "what follows is the book 'X'" it may or may not regurgitate the book correctly. Give it a chunk of the first chapter and let it regurgitate from there and you tend to get fairly faithful recovery, especially for things on gutenberg.

So it might be there, by predcondiditioning latent space to the area of harry potter world, you make it so much more probable that the full spell list is regurgitated from online resources that were also read, while asking naive might get it sometimes, and sometimes not.

the books act like a hypnotic trigger, and may not represent a generalized skill. Hence why replacing with random words would help clarify. if you still get the origional spells, regurgitation confirmed, if it finds the spells, it could be doing what we think. An even better test would be to replace all spell references AND jumble chapters around. This way it cant even "know" where to "look" for the spell names from training.

angst · 2026-02-06T02:29:37 1770344977

btw it recalls 42 when i asked. (without web search)

full transcript: pastebin.com/sMcVkuwd

f33d5173 · 2026-02-06T02:44:56 1770345896

Not sure how they're being counted, but that adds up to 46 with the pair spells counted separately. But then nox is counted twice, so maybe 45.

heavyset_go · 2026-02-06T02:13:16 1770343996

No, because you don't know the magic spell (forgive me) of context that can be used to "unlock" that information if it's stored in the NN.

I mean, you can try, but it won't be a definitive answer as to whether that knowledge truly exists or doesn't exist as it is encoded into the NN. It could take a lot of context from the books themselves to get to it.

joshmlewis · 2026-02-05T22:57:41 1770332261

I think the OP was implying that it's probably already baked into its training data. No need to search the web for that.

obirunda · 2026-02-06T01:17:03 1770340623

This underestimates how much of the Internet is actually compressed into and is an integral part of the model's weights. Gemini 2.5 can recite the first Harry Potter book verbatim for over 75% of the book.

NiloCK · 2026-02-06T02:42:10 1770345730

I'm getting astrology when I search for this. Any links on this?

f33d5173 · 2026-02-06T02:48:28 1770346108

Iirc it's not quite true. 75% of the book is more likely to appear than you would expect by chance if prompted with the prior tokens. This suggests that it has the book encoded in its weights, but you can't actually recover it by saying "recite harry potter for me".

jdminhbg · 2026-02-06T02:54:11 1770346451

Do you happen to know, is that because it can’t recite Harry Potter, or because it’s been instructed not to recite Harry Potter?

jazzyjackson · 2026-02-06T03:47:36 1770349656

It's a matter of token likelihood... as a continuation, the rest of chapter one is highly likely to follow the first paragraph.

The full text of Chapter One is not the only/likeliest possible response to "recite chapter one of harry potter for me"

jamesfinlayson · 2026-02-06T04:53:45 1770353625

Instructed not to was my understanding.

obirunda · 2026-02-06T03:22:43 1770348163

https://arxiv.org/abs/2601.02671?hl=en-US

IAmGraydon · 2026-02-06T03:39:07 1770349147

I'm not sure what your knowledge level of the inner workings of LLMs is, but a model doesn't need search or even an internet connection to "know" the information if it's in its training dataset. In your example, it's almost guaranteed that the LLM isn't searching books - it's just referencing one of the hundreds of lists of those spells in it's training data.

This is the LLM's magic trick that has everyone fooled into thinking they're intelligent - it can very convincingly cosplay an intelligent being by parroting an intelligent being's output. This is equivalent to making a recording of Elvis, playing it back, and believing that Elvis is actually alive inside of the playback device. And let's face it, if a time traveler brought a modern music playback device back hundreds of years and showed it to everyone, they WOULD think that. Why? Because they have not become accustomed to the technology and have no concept of how it could work. The same is true of LLMs - the technology was thrust on society so quickly that there was no time for people to adjust and understand its inner workings, so most people think it's actually doing something akin to intelligence. The truth is it's just as far from intelligence your music playback device is from having Elvis inside of it.

kgeist · 2026-02-06T15:57:41 1770393461

>The truth is it's just as far from intelligence your music playback device is from having Elvis inside of it.

A music playback device's purpose is to allow you hear Elvis' voice. A good device does it well: you hear Elvis' voice (maybe with some imperfections). Whether a real Elvis is inside of it or not, doesn't matter - its purpose is fulfilled regardless. By your analogy, an LLM simply reproduces what an intelligent person would say on the matter. If it does its job more-less, it doesn't matter either, whether it's "truly intelligent" or not, its output is already useful. I think it's completely irrelevant in both cases to the question "how well does it do X?" If you think about it, 95% we know we learned from school/environment/parents, we didn't discover it ourselves via some kind of scientific method, we just parrot what other intelligent people said before us, mostly. Maybe human "intelligence" itself is 95% parroting/basic pattern matching from training data? (18 years of training during childhood!)

Trasmatta · 2026-02-06T00:54:01 1770339241

Do the same experiment in the Claude web UI. And explicitly turn web searches off. It got almost all of them for me over a couple of prompts. That stuff is already in its training data.

soulofmischief · 2026-02-05T23:23:30 1770333810

The only worthwhile version of this test involves previously unseen data that could not have been in the training set. Otherwise the results could be inaccurate to the point of harmful.

altmanaltman · 2026-02-06T05:58:49 1770357529

> But for sure it has some internal knowledge already.

Pretty sure the books had to be included in its training material in full text. It's one of the most popular book series ever created, of course they would train on it. So "some" is an understatement in this case.

eek2121 · 2026-02-05T22:58:03 1770332283

Honestly? My advice would be to cook something custom up! You don't need to do all the text yourself. Maybe have AI spew out a bunch of text, or take obscure existing text and insert hidden phrases here or there.

Shoot, I'd even go so far as to write a script that takes in a bunch of text, reorganizes sentences, and outputs them in a random order with the secrets. Kind of like a "Where's Waldo?", but for text

Just a few casual thoughts.

I'm actually thinking about coming up with some interesting coding exercises that I can run across all models. I know we already have benchmarks, however some of the recent work I've done has really shown huge weak points in every model I've run them on.

clhodapp · 2026-02-05T23:43:57 1770335037

Having AI spew it might suffer from the fact that the spew itself is influenced by AI's weights. I think your best bet would be to use a new human-authored work that was released after the model's context cutoff.

qwertytyyuu · 2026-02-06T03:03:50 1770347030

Hmm… maybe he could switch out all the spells names slightly different ones and see how that goes

meroes · 2026-02-05T22:06:52 1770329212

What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.

muzani · 2026-02-05T22:33:17 1770330797

There was a time when I put the EA-Nasir text into base64 and asked AI to convert it. Remarkably it identified the correct text but pulled the most popular translation of the text than the one I gave it.

majewsky · 2026-02-06T01:28:34 1770341314

Sucks that you got a really shitty response to your prompt. If I were you, the model provider would be receiving my complaint via clay tablet right away.

noumenon1111 · 2026-02-06T16:43:00 1770396180

Imagine you ordered the new Claude Opus and instead you got Gemini telling you to glue the cheese on your pizza...

rvz · 2026-02-05T22:24:21 1770330261

> What is this supposed to show exactly?

Nothing.

You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.

Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.

matt_lo · 2026-02-06T05:41:36 1770356496

use AI to rewrite all the spells from all the books, then try to see if AI can detect the rewritten ones. This will ensure it's not pulling from it's trained data set.

gbalduzzi · 2026-02-06T06:35:18 1770359718

Neat idea, but why should I use AI for a find and replace?

It feels like shooting a fly with a bazooka

jack_pp · 2026-02-06T08:08:23 1770365303

it's like hiring someone to come pick up your trash from your house and put it on the curb.

it's fine if you're disabled

miohtama · 2026-02-06T06:51:12 1770360672

Bazooka guarantees the hit

xenodium · 2026-02-06T07:46:41 1770364001

I like LLMs, but guarantees in LLMs are... you know... not guaranteed ;)

throwaway290 · 2026-02-06T08:20:42 1770366042

I think that was the point

imafish · 2026-02-06T09:55:50 1770371750

If all you have is a hammer.. ;)

luckydata · 2026-02-06T08:12:29 1770365549

do you know all the spells you're looking for from memory?

wickedsight · 2026-02-06T08:57:05 1770368225

You could just, you know, Google the list.

Applejinx · 2026-02-06T10:41:36 1770374496

and then the first thing you see will be at least one of ITS AI responses, whether you liked it or not

bilekas · 2026-02-06T08:15:25 1770365725

You're missing the point, it's only a testing excersize for the new model.

happyraul · 2026-02-06T08:38:22 1770367102

No, the point is that you can set up the testing exercise without using an LLM to do a simple find and replace.

kakacik · 2026-02-06T10:51:49 1770375109

Its a test. Like all tests, its more or less synthetic and focused on specific expected behavior. I am pretty far from llms now but this seems like a very good test to see how geniune this behavior actually is (or repeat it 10x with some scramble for going deeper).

inexcf · 2026-02-06T14:51:40 1770389500

This thread is about the find-and-replace, not the evaluation. Gambling on whether the first AI replaces the right spells just so the second one can try finding them is unnecessary when find-and-replace is faster, easier and works 100%.

bilekas · 2026-02-06T09:14:26 1770369266

... I'm not sure if you're trolling or if you missed the point again. The point is to test the contextual ability and correctness of the LLMs ability's to perform actions that would be hopefully guaranteed to not be in the training data.

It has nothing to do about the performance of the string replacement.

The initial "Find" is to see how well it performs actually find all the "spells" in this case, then to replace them. They using a separate context maybe, evaluate if the results are the same or are they skewed in favour of training data.

LeoPanthera · 2026-02-06T07:45:51 1770363951

That won't help. The AI replacing them will probably miss the same ones as the AI finding them.

steve1977 · 2026-02-06T09:11:57 1770369117

I think the question was if it will still find 49 out of 50 if they have been replaced.

zamadatix · 2026-02-05T21:51:45 1770328305

To be fair, I don't think "Slugulus Eructo" (the name) is actually in the books. This is what's in my copy:

> The smug look on Malfoy’s face flickered.

> “No one asked your opinion, you filthy little Mudblood,” he spat.

> Harry knew at once that Malfoy had said something really bad because there was an instant uproar at his words. Flint had to dive in front of Malfoy to stop Fred and George jumping on him, Alicia shrieked, “How dare you!”, and Ron plunged his hand into his robes, pulled out his wand, yelling, “You’ll pay for that one, Malfoy!” and pointed it furiously under Flint’s arm at Malfoy’s face.

> A loud bang echoed around the stadium and a jet of green light shot out of the wrong end of Ron’s wand, hitting him in the stomach and sending him reeling backward onto the grass.

> “Ron! Ron! Are you all right?” squealed Hermione.

> Ron opened his mouth to speak, but no words came out. Instead he gave an almighty belch and several slugs dribbled out of his mouth onto his lap.

sobjornstad · 2026-02-05T23:56:13 1770335773

I have a vague recollection that it might come up named as such in Half-Blood Prince, written in Snape's old potions textbook?

In support of that hypothesis, the Fandom site lists it as “mentioned” in Half-Blood Prince, but it says nothing else and I'm traveling and don't have a copy to check, so not sure.

zamadatix · 2026-02-06T01:48:40 1770342520

Hmm, I don't get a hit for "slugulus" or "eructo" (case insensitive) in any of the 7. Interestingly two mentions of "vomit" are in book 6, but neither in reference to to slugs (plenty of Slughorn of course!). Book 5 was the only other one a related hit came up:

> Ron nodded but did not speak. Harry was reminded forcibly of the time that Ron had accidentally put a slug-vomiting charm on himself. He looked just as pale and sweaty as he had done then, not to mention as reluctant to open his mouth.

There could be something with regional variants but I'm doubtful as the Fandom site uses LEGO Harry Potter: Years 1-4 as the citation of the spell instead of a book.

Maybe the real LLM is the universe and we're figuring this out for someone on Slacker News a level up!

ck_one · 2026-02-05T22:30:35 1770330635

Then it's fair that id didn't find it

NeroVanbierv · 2026-02-10T09:28:54 1770715734

Just did a similar experiment but outside the harry potter universe to remove the training bias. It worked well!

> ChatGPT: "Generate a two page short story like harry potter, but don´t mention anyting harry potter related. make up 4 unique spells in the story that are used"

Response see https://chatgpt.com/share/698af9cd-f628-800d-9250-b260f1478c...

> Claude: "What unique wizarding spells can you find in this story? [story]"

Response = https://i.imgur.com/Jzzs3PC.png

guluarte · 2026-02-05T22:19:09 1770329949

you can get the same result just asking opus/gpt, it is probably internalized knowledge from reddit or similar sites.

ck_one · 2026-02-05T22:33:19 1770330799

If you just ask it you don't get the same result. Around 13 spells were missing when I just prompted Opus 4.6 without the books as context.

guluarte · 2026-02-06T03:05:18 1770347118

because it is non-deterministic, i just did and got 50 spells.

prompt: finding every spell in all Harry Potter books in the first 4 books, only the officially documented spells across those 4 books, there are 50, without using websearch

Here's my best compilation from memory, organized by book. I'll be upfront: getting to exactly 50 is tricky without cross-referencing, so some might be debatable or I might miss a couple. Here goes:

---

*Book 1 - Philosopher's/Sorcerer's Stone*

1. *Lumos* - produces light 2. *Nox* - extinguishes Lumos 3. *Alohomora* - unlocks doors 4. *Wingardium Leviosa* - levitation 5. *Petrificus Totalus* - full body-bind 6. *Locomotor Mortis* - leg-locker curse 7. *Incendio* - produces fire 8. *Reparo* - mends broken objects

*Book 2 - Chamber of Secrets*

9. *Expelliarmus* - disarms opponent 10. *Rictusempra* - tickling charm 11. *Tarantallegra* - forces target to dance 12. *Serpensortia* - conjures a snake 13. *Finite Incantatem* - ends spell effects 14. *Obliviate* - erases memories 15. *Aparecium* - reveals invisible ink 16. *Immobulus* - freezes target in place 17. *Homorphus* - mentioned by Lockhart (werewolf cure claim) 18. *Peskipiksi Pesternomi* - Lockhart's failed pixie spell

*Book 3 - Prisoner of Azkaban*

19. *Expecto Patronum* - produces a Patronus 20. *Riddikulus* - repels a Boggart 21. *Waddiwasi* - shoots objects at target 22. *Impervius* - waterproofing charm 23. *Dissendium* - opens secret passage 24. *Ferula* - conjures bandages/splint 25. *Mobilicorpus* - moves a body 26. *Lumos Maxima* - intense light

*Book 4 - Goblet of Fire*

27. *Accio* - summoning charm 28. *Avada Kedavra* - killing curse 29. *Crucio* - Cruciatus curse (torture) 30. *Imperio* - Imperius curse (control) 31. *Stupefy* - stunning spell 32. *Engorgio* - enlarges target 33. *Reducio* - shrinks target 34. *Sonorus* - amplifies voice 35. *Quietus* - reverses Sonorus 36. *Morsmordre* - conjures the Dark Mark 37. *Priori Incantatem* - reveals last spell cast 38. *Deletrius* - erases magical residue 39. *Densaugeo* - enlarges teeth 40. *Furnunculus* - causes boils 41. *Impedimenta* - slows/stops target 42. *Reducto* - blasts solid objects 43. *Diffindo* - severing charm 44. *Relashio* - releases sparks/grip 45. *Orchideous* - conjures flowers 46. *Avis* - conjures birds 47. *Point Me* - Four-Point Spell (compass) 48. *Ennervate* - revives stunned person 49. *Protego* - shield charm 50. *Conjunctivitis Curse* - affects eyesight (Krum on the dragon)

---

A few caveats: some of these (like Lumos Maxima, Homorphus, Peskipiksi Pesternomi) are borderline since they're either mentioned rather than properly cast, or might be film additions that bleed into memory. The Conjunctivitis Curse is described but its incantation isn't explicitly given in the text. And Protego might technically first appear with its incantation in Book 5 during DA practice rather than Book 4.

If you want, I can turn this into a spreadsheet or document with columns for spell name, effect, who casts it, and which chapter.

irishcoffee · 2026-02-05T23:40:07 1770334807

The top comment is about finding basterized latin words from childrens books. The future is here.

Geste · 2026-02-05T23:51:21 1770335481

I'll have some of that coffee too, this is quite a sad time we're living where this is a proper use of our limited resources.

mhink · 2026-02-06T03:58:13 1770350293

> basterized

And yet, it's still somewhat better than the Hacker News comment using bastardized English words.

kmacdough · 2026-02-06T18:29:06 1770402546

What are we testing here?

It feels like a very odd test because it's such an unreasonable way to answer this with an LLM. Nothing about the task requires more than a very localized understanding. It's not like a codebase or corporate documentation, where there's a lot of interconectedness and context that's important. It also doesn't seem to poke at the gap between human and AI intelligence.

Why are people excited? What am I missing?

muzani · 2026-02-05T23:06:15 1770332775

There's a benchmark which works similarly but they ask harder questions, also based on books https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...

I guess they have to add more questions as these context windows get bigger.

kybernetikos · 2026-02-05T23:49:59 1770335399

I recently got junie to code me up an MCP for accessing my calibre library. https://www.npmjs.com/package/access-calibre

My standard test for that was "Who ends up with Bilbo's buttons?"

dwa3592 · 2026-02-05T23:30:00 1770334200

have another LLM (gemini, chatgpt) make up 50 new spells. insert those and test and maybe report here :)

bartman · 2026-02-05T22:32:54 1770330774

Have you by any chance tried this with GPT 4.1 too (also 1M context)?

dom96 · 2026-02-06T00:05:39 1770336339

I often wonder how much of the Harry Potter books were used in the training. How long before some LLM is able to regurgitate full HP books without access to the internet?

big-chungus4 · 2026-02-09T13:54:07 1770645247

Now edit the books and replace all spell names with different ones, and try again

big-chungus4 · 2026-02-09T13:54:19 1770645259

By work I meant with

matt-p · 2026-02-07T12:01:47 1770465707

Now try it without giving it the books as context. I'm sure it probably knows there are 49.

ActionHank · 2026-02-06T14:14:27 1770387267

The books were likely in the training data, I don't know that it's that impressive.

SebastianSosa · 2026-02-06T14:24:57 1770387897

now thx to this post (and the infra provider inclination to appeal to hacker news) we will never know if the model actually discovered the 50 spells or memorized it. Since it will be trained on this. :( But what can you do, this is interesting

TheRealPomax · 2026-02-06T00:00:42 1770336042

That doesn't seem a super useful test for a model that's optimized for programming?

kylehotchkiss · 2026-02-06T21:49:34 1770414574

I love the fun metric.

My hope is that locally run models can pass this test in the next year or two!

LanceJones · 2026-02-05T22:34:48 1770330888

Assuming this experiment involved isolating the LLM from its training set?

grey-area · 2026-02-06T14:16:18 1770387378

Of course it didn't. Not sure you really can do that - LLMs are a collection of weights from the training set, take away the training set and they don't really exist. You'd have to train one from scratch excluding these books and all excerpts and articles about them somehow, which would be very expensive and I'm pretty sure the OP didn't do that.

So the test seems like a nonsensical test to me.

siwatanejo · 2026-02-06T01:31:27 1770341487

> All 7 books come to ~1.75M tokens

How do you know? Each word is one token?

koakuma-chan · 2026-02-06T01:49:02 1770342542

You can download the books and run them through a tokenizer. I did that half a year ago and got ~2M.

psychoslave · 2026-02-06T15:55:20 1770393320

Ah and no one thrown TOAC in it yet?

polynomial · 2026-02-06T16:38:29 1770395909

You need to publish this tbh

huangmeng · 2026-02-06T02:31:47 1770345107

you are rich

dr_dshiv · 2026-02-06T07:56:56 1770364616

Comparison to another model?

dudewhocodes · 2026-02-06T08:12:23 1770365543

There are websites with the spells listed... which makes this a search problem. Why is an LLM used here?

bilekas · 2026-02-06T08:14:21 1770365661

It's just a benchmark test excersize.

hansmayer · 2026-02-06T10:22:45 1770373365

> Just tested the new Opus 4.6 (1M context) on a fun needle-in-a-haystack challenge: finding every spell in all Harry Potter books.

Clearly a very useful, grounded and helpful everyday use case of LLMs. I guess in the absence of real-world use cases, we'll have to do AI boosting with such "impressive" feats.

Btw - a well crafted regex could have achieved the same (pointless) result with ~0.0000005% of resources the LLM machine used.

ck_one · 2026-01-25T18:12:23 1769364743

This is the moment for Europe to show that you can do gov and business differently. If they get their s** together and actually present a viable alternative.

alecco · 2026-01-25T18:23:56 1769365436

They are doing it differently alright.

https://en.wikipedia.org/wiki/Chat_control

lillecarl · 2026-01-25T19:05:08 1769367908

You're saying a proposed bill which hasn't passed is comparative to recent events in the US or am I reading too much between the lines?

alecco · 2026-01-25T19:38:25 1769369905

You're saying EU is any different to USA?

Palantir clients: Europol, Danish POL-INTEL, NHS UK, UK Ministry of Defence, German Police (states), NATO, Ukraine, ASML, Siemens, Airbus, Credit Suisse, UBS, BP, Merck, ...

https://en.wikipedia.org/wiki/Palantir_Technologies#Customer...

https://www.palantir.com/partners/international/

vladms · 2026-01-25T20:19:40 1769372380

Nitpicking, many on your list are not part of the EU : NHS UK, UK Ministry of Defence, NATO, Ukraine, UBS, BP.

Plus, the EU is 27 countries, out of which 5 are listed on their wiki page, with various institutions.

direwolf20 · 2026-01-25T18:34:18 1769366058

Europe can't do business differently. Or at least it doesn't seem to be able to. China can.

nathan_compton · 2026-01-25T18:49:20 1769366960

Last I checked millions of europeans are living in a functioning civilization. I've lived in Europe. It is ok.

Don't confuse "GDP not as big as ours" with "totally non-functional."

direwolf20 · 2026-01-25T19:28:53 1769369333

I didn't say it was totally nonfunctional, I said they can't do business differently than they are currently doing.

p1esk · 2026-01-25T18:56:00 1769367360

China can

Yes, things are different in totalitarian states.

red-iron-pine · 2026-01-26T14:09:19 1769436559

"China can, due to mandatory 996 work hours"

skrebbel · 2026-01-25T18:18:24 1769365104

How is it not viable now?

lugu · 2026-01-25T21:08:00 1769375280

What would you like to see changed in the EU?

Jordan-117 · 2026-01-25T18:24:15 1769365455

"Best I can do is Chat Control 3.0"

ck_one · 2025-07-14T17:54:02 1752515642

Is that actually possible? Can we do a live test here?

Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town

How much do I have to pay you to get it?

chasd00 · 2025-07-14T20:15:02 1752524102

i think it could be feasible to get an ad in front of "35-year-old dentists living on the 400 block of Elm street in local town" who has bought product X but i've never seen a transaction by transaction purchase history being for sale.

dylan604 · 2025-07-14T18:07:21 1752516441

How much you got?

Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason

lazyasciiart · 2025-07-14T18:44:19 1752518659

Ok, so nobody here knows directly of any case where such data has been purchased, or vaguely similar, and we have no pricing information whatsoever available, but we are somehow completely knowledgeable about it being possible and how to do it? That sounds unlikely.

andrew_lettuce · 2025-07-14T19:05:24 1752519924

The supposedly in-the-know responses here are full of bravado but not much other than "trust me, bro"

JohnMakin · 2025-07-14T22:11:35 1752531095

https://news.ycombinator.com/item?id=44565878

Yea, you know everything, don't you.

some_random · 2025-07-15T01:06:15 1752541575

Wow the Transunion business site, that really proves it huh.

dylan604 · 2025-07-15T01:15:05 1752542105

Experian is known to sell the data they have. Why is this even in question? If I provide you Experian's website, you would give the same BS response?

Let me google this for you...

https://duckduckgo.com/?q=how+to+buy+data+from+a+data+broker...

some_random · 2025-07-15T01:29:22 1752542962

The conversation was for buying transaction data from specific people, something that many seem to insist is easy and cheap and doable. Meanwhile if you actually read the responses to that search you smugly cited you'll find that no one seems to know how to actually do anything remotely like this. Yes this data is definitely harvested and it seems like you should be able to buy it in bulk from someone somewhere, but again no one seems to know where or how much or what the purchase minimum would be etc.

JohnMakin · 2025-07-15T03:08:43 1752548923

You asked for an example, one was given. if you’re saying you dont know how to send an email to a business page with the products purporting services described here - no comment in this section can help you

leoqa · 2025-07-14T19:03:50 1752519830

Yeah people fail to provide examples but continue to be doomers about how easy it is.

dylan604 · 2025-07-15T01:10:23 1752541823

Been busy, but since you seem to be unable to find any body by searching on your own for the past 6 hours, here's something I found with a quick little search.

https://datarade.ai/data-categories/food-grocery-transaction...

Have we really lost the ability to use search functionality??

JohnMakin · 2025-07-14T22:08:49 1752530929

Of course people do. 5 seconds spent doing the most sparse-ass research will help you find plenty of stuff. If people don't respond, I imagine, for fear of 1) outing the specific area they work in, or 2) realizing these kinds of comments aren't generally acting in good faith so it is generally a complete waste of time.

I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.

https://www.transunion.com/business

Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.

There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.

lazyasciiart · 2025-07-15T05:18:54 1752556734

I didn’t ask for a link to a company that can do it. I want pricing. I am saying that nobody here is willing to share anything even approaching specific pricing, which makes me very much doubt that any of them have the direct transaction experience they are claiming. I don’t doubt that underwater welding exists, but I do doubt that anyone in this thread has done it, or has any direct experience with it.

southernplaces7 · 2025-07-15T00:44:32 1752540272

>There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences.

Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.

dylan604 · 2025-07-15T01:06:53 1752541613

Credit card companies are known to sell data. https://www.cbsnews.com/news/mastercard-credit-card-customer...

Mobile service providers are known to have sold data. https://www.fcc.gov/document/fcc-fines-largest-wireless-carr...

Auto makers are known to sell data. https://www.caranddriver.com/news/a61711288/automakers-sold-...

You act like it doesn't happen, yet time and time again we learn about companies selling whatever data they can collect.

I can't believe we are still questioning this fact

What else do you need to know?

southernplaces7 · 2025-07-15T01:14:38 1752542078

I think you misunderstand. I'm not doubting that it happens widely and pervasively. It's evident that this is the case. I just requested examples based on some of the very specific claims made here despite many ambiguities in how they were phrased.

Anyhow, thanks for taking the time to include some links.

dylan604 · 2025-07-15T02:01:13 1752544873

For the most part, readers here are against it. Just because someone doesn’t know how to do it does not mean it is not doable. If it were not doable, these companies would not exist. I’ve already spent more time than I care on the topic. So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know. You can only lead hostess to water, but you can’t make it drink.

southernplaces7 · 2025-07-15T02:31:03 1752546663

Hostess? You mean horses no?

>So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know.

As I very clearly said above, I don't doubt it at all, I was just asking for any clarification on who to whom.

JohnMakin · 2025-07-15T03:11:39 1752549099

and you were given them. so why keep taking this persistent obstinate line of questioning and persistent downvoting? it’s transparent and tired. industry experts chime in on this stuff all the time, it isnt anything done in backrooms or anything and is in the open. the only barrier to you not knowing is your own ignorance.

southernplaces7 · 2025-07-15T06:44:14 1752561854

Persistent questioning (usually a good thing for debate by the way. I wasn't asking anything unreasonable), because I was curious and hoping for some answers from anyone willing to provide concrete details for claims that struck me as ambiguous on hard details, even though I don't doubt the widespread existence of personal data selling at all, as I've repeatedly mentioned.. As for downvoting, not sure what you mean. I can only say I've never once downvoted a single comment in any time spent commenting on this site, ever. I think the whole downvote thing is detestable and infantile. I'll even upvote things I disagree with just to counter it if I see them go grey.

some_random · 2025-07-15T01:05:18 1752541518

Literally all anyone is asking for is one single concrete example of a site where you can roll up and buy personal information.

metamet · 2025-07-14T18:29:41 1752517781

But what type of range are we talking? Tens, hundreds, thousands?

dylan604 · 2025-07-14T19:26:48 1752521208

It could also mean that if you have to ask... or the first rule of data brokering...

Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games

Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house

some_random · 2025-07-15T01:07:06 1752541626

If you need John Oliver to do it maybe it's not such a big problem? If no one here is able to provide a single concrete example, maybe it's not real?

dylan604 · 2025-07-15T01:13:38 1752542018

John Oliver likes to spend HBO's money to do things others can't do while entertaining the rest of us. I'm not spending my money on something to prove what is known as possible for you. At this point, even with receipts, you're coming across as someone that would argue that grass is not green, or water isn't wet, and fire isn't hot.

Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.

some_random · 2025-07-15T01:42:53 1752543773

All of this started because in response to an extremely concrete question, what's the cost of transaction data for a tightly constrained population, you replied with a smug non-answer about the greed of salespeople. These questions only got "belligerent" because every single answer has been nonsense insisting that it's super easy and cheap but also I couldn't possibly name a single site where this data is sold or provide even an order of magnitude of cost. Or maybe now it requires HBO levels of funding, who knows.

dylan604 · 2025-07-15T02:03:22 1752545002

I offered sage advice on how to negotiate when you don’t know a firm price on anything whether that be data or a car or a home remodeling. If you want to say that advice was a smug answer then that’s on you. Every answer after has just gone further and further off the rails

some_random · 2025-07-15T02:13:03 1752545583

Nah there's no way you actually watch John Oliver because that was really funny. Anyways, you mentioned earlier that we wouldn't believe you even if you posted receipts but that's actually exactly what we want to see. Like, just the name of a business, the thing that was sold, and the price.

ck_one · 2025-05-31T13:44:02 1748699042

In many fields there is no moat. It’s an execution battle and it comes down to question: can the startup innovate faster and get to the customers or can the incumbent defend its existing distribution well enough.

Microsoft owns GitHub and VSCode yet cursor was able to out execute them. Legora is moving very quickly in the legal space. Not clear yet who will win.

jgraettinger1 · 2025-05-31T19:17:40 1748719060

> Microsoft owns GitHub and VSCode yet cursor was able to out execute them

Really? My startup is under 30 people. We develop in the open (source available) and are extremely willing to try new process or tooling if it'll gain us an edge -- but we're also subject to SOC2.

Our own evaluation was Cursor et all isn't worth the headache of the compliance paperwork. Copilot + VSCode is playing rapid catch-up and is a far easier "yes".

How large is the intersection of companies who a) believe Cursor has a substantive edge in capability, and b) have willingness to send Cursor their code (and go through the headaches of various vendor reviews and declarations)?

sushid · 2025-06-01T22:15:30 1748816130

Windsurf was acquired for $3B by OAI and it's clearly the worse of the two. Cursor is trying to raise at a $10B valuation and has $300MM in ARR in less than two years.

So in short, yes, companies do appear to be showing some willingness to send Cursor their code, even with all the headache associated with getting a new vendor.

thereeldeel · 2025-06-03T03:41:37 1748922097

Could you elaborate on the 'headaches of various vendor reviews and declarations'? (I thought paperwork etc was table stakes for a SOC 2 certificate)

neom · 2025-05-31T16:56:09 1748710569

A business is nothing more than defined, established, repeatable systems and processes. That is the difficulty with tooling businesses.

ck_one · on Feb 14, 2025

It's better than perplexity on certain tasks! Pretty impressive!

Feedback: the mermaid charts are a bit annoying when they are not needed

Our of curiosity, what web index do you guys use right now and does it make sense to build your own index at some point?

ck_one · on Feb 5, 2025

What object detection model do you use?

a3w · on Feb 5, 2025

Is tesseract even ML based? Oh, this piece of software is more than 19 years old, perhaps there are other ways to do good, cheap OCR now. Does Gemini have an OCR library, internally? For other LLMs, I had the feeling that the LLM scripts a few lines of python to do the actual heavy lifting with a common OCR framework.

llm_trw · on Feb 5, 2025

Custom trained yolo v8. I've moved on since then and the work was done in 2023. You'd get better results for much less today.

ck_one · on Oct 1, 2024

Quetzal as a product is inevitable. It will make it even easier to target international markets from day one with a tiny team. Congrats on the launch! You rock! Greetings from section 4D!

ck_one · on Sept 4, 2024

What's your next step? Why did you decide to focus on enzyme design?

EdHarris · on Sept 4, 2024

We think enzymes are super cool! You can build molecular assembly lines at the atomic scale with them. Many pharmaceuticals are already manufactured with enzymes such as the diabetes drug Januvia. Engineering them is a big bottleneck though - takes years and millions of dollars. We want to speed this up with AI-powered design. Next step is ligand-protein prediction capability of AlphaFold3, which is also super useful for modelling enzyme-substrate interactions.