Hacker Newsnew | past | comments | ask | show | jobs | submit | ck_one's commentslogin

Do you plan on Open Sourcing it? A bit scary to just execute a random binary and put in a bunch of API keys.

It is already MIT license.

Didn't see it, thanks!

How do you think big labs will try to attack you guys? Why not just close the data to them?


Just tested the new Opus 4.6 (1M context) on a fun needle-in-a-haystack challenge: finding every spell in all Harry Potter books.

All 7 books come to ~1.75M tokens, so they don't quite fit yet. (At this rate of progress, mid-April should do it ) For now you can fit the first 4 books (~733K tokens).

Results: Opus 4.6 found 49 out of 50 officially documented spells across those 4 books. The only miss was "Slugulus Eructo" (a vomiting spell).

Freaking impressive!


Surely the corpus Opus 4.6 ingested would include whatever reference you used to check the spells were there. I mean, there are probably dozens of pages on the internet like this:

https://www.wizardemporium.com/blog/complete-list-of-harry-p...

Why is this impressive?

Do you think it's actually ingesting the books and only using those as a reference? Is that how LLMs work at all? It seems more likely it's predicting these spell names from all the other references it has found on the internet, including lists of spells.


Most people still don't realize that general public world knowledge is not really a test for a model that was trained on general public world knowledge. I wouldn't be surprised if even proprietary content like the books themselves found their way into the training data, despite what publishers and authors may think of that. As a matter of fact, with all the special deals these companies make with publishers, it is getting harder and harder for normal users to come up with validation data that only they have seen. At least for human written text, this kind of data is more or less reserved for specialist industries and higher academia by now. If you're a janitor with a high school diploma, there may be barely any textual information or fact you have ever consumed that such a model hasn't seen during training already.


> I wouldn't be surprised if even proprietary content like the books themselves found their way into the training data

No need for surprises! It is publicly known that the corpus of 'shadow libraries' such as Library Genesis and Anna's Archive were specifically and manually requested by at least NVIDIA for their training data [1], used by Google in their training [2], downloaded by Meta employees [3] etc.

[1] https://news.ycombinator.com/item?id=46572846

[2] https://www.theguardian.com/technology/2023/apr/20/fresh-con...

[3] https://www.theverge.com/2023/7/9/23788741/sarah-silverman-o...


also:

"Researchers Extract Nearly Entire Harry Potter Book From Commercial LLMs"

https://www.aitechsuite.com/ai-news/ai-shock-researchers-ext...


The big AI houses are all in involved in varying degrees of litigation (all the way to class action lawsuits) with the big publishing houses. I think they at least have some level of filtering for their training data to keep them legally somewhat compliant. But considering how much copyrighted stuff is spread blisfully online, it is probably not enough to filter out the actual ebooks of certain publishers.


> I think they at least have some level of filtering for their training data to keep them legally somewhat compliant.

So far, courts are siding with the "fair use" argument. No need to exclude any data.

https://natlawreview.com/article/anthropic-and-meta-fair-use...

"Even if LLM training is fair use, AI companies face potential liability for unauthorized copying and distribution. The extent of that liability and any damages remain unresolved."

https://www.whitecase.com/insight-alert/two-california-distr...


> even proprietary content like the books themselves

This definitely raises an interesting question. It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files. Immediately to mind was House of Leaves, Infinite Jest, Harry Potter, basically any Stephen King book - they've all been posted at some point.

Do LLMS have a good way of inferring where knowledge from the context begins and knowledge from the training data ends?


> It seems like a good chunk of popular literature (especially from the 2000s) exists online in big HTML files

Anna's Archive alone claims to currently publicly host 61,654,285 books, more than 1PB in total.


Maybe y’all missed this?

https://www.washingtonpost.com/technology/2026/01/27/anthrop...

Anthropic, specifically, ingested libraries of books by scanning and then disposing of them.


> If you're a janitor with a high school diploma, there may be barely any textual information or fact you have ever consumed that such a model hasn't seen during training already.

The plot of Good Will Hunting would like a word.


So a good test would be replacing the spell names in the books with made-up spells. And if a "real" spell name was given, it also tests whether it "cheated".


I've run that experiment now, spoiler: It cheated with its pre-training knowledge https://georggrab.net/content/opus46retrieval.html

Thanks for trying! Good to know.

A real test is synthesizing 100,000 sentences of this slect random ones and then inject the traits you want thr LLM to detect and describe, eg have a set of words or phrases that may represent spells and have them used so that they do something. Then have the LLM find these random spells in the random corpus.


It could still remember where each spell is mentioned. I think the only way to properly test this would be to run it against an unpublished manuscript.


Any obscure work of fiction or fanfiction would likely be fine as a casual test.

If you ask a model to discuss an obscure work it'll have no clue what it's about.

This is very different than asking about Harry Potter.


Yeah, that's what I've been doing as well, and at least Gemini 3 Pro did not fare very well.


For fun I've asked Gemini Pro to answer open ended questions about obscure books like "Read this novel and tell me what the hell is this book, do a deep reading and analyze" and I've gotten insightful/ enjoyable answers but I've never asked it to make lists of spells or anything like that.


It's impressive, even if the books and the posts you're talking about were both key parts of the training data.

There are many academic domains where the research portion of a PhD is essentially what the model just did. For example, PhD students in some of the humanities will spend years combing ancient sources for specific combinations of prepositions and objects, only to write a paper showing that the previous scholars were wrong (and that a particular preposition has examples of being used with people rather than places).

This sort of experiment shows that Opus would be good at that. I'm assuming it's trivial for the OP to extend their experiment to determine how many times "wingardium leviosa" was used on an object rather than a person.

(It's worth noting that other models are decent at this, and you would need to find a way to benchmark between them.)


I don’t think this example proves your point. There’s no indication that the model actually worked this out from the input context, instead of regurgitating it from the training weights. A better test would be to subtly modify the books fed in as input to the model so that there was actually 51 spells, and see if it pulls out the extra spell, or to modify the names of some spells, etc.

In your example, it might be the case that the model simply spits out consensus view, rather than actually finding/constructing this information on his own.


Ah, that's a good point.


Since it got 49 of 50 right its worse than what you would get using a simple google search. People would immediately disregard a conventional source that only listed 49 out of 50.


The poster you reply to works in AI. The marketing strategy is to always have a cute Pelican or Harry Potter comment as the top comment for positive associations.

The poster knows all of that, this is plain marketing.


This sounds compelling, but also something that an armchair marketer would have theorycrafted without any real-world experience or evidence that it actually works - and I searched online and can't find any references to something like it.

Do you have a citation for this?


They should try the same thing but replace the original spell names with something else.


Why doesn’t you ask it and find out ;)


Because the model doesn't know but will happily tell a convincing lie about how it works.


I was playing about with Chat GPT the other day, uploading screen shots of sheet music and asking it to convert it to ABC notation so I could make a midi file of it.

The results seemed impressive until I noticed some of the "Thinking" statements in the UI.

One made it apparent the model / agent / whatever had read the title from the screenshot and was off searching for existing ABC transcripts of the piece Ode to Joy.

So the whole thing was far less impressive after that, it wasn't reading the score anymore, just reading the title and using the internet to answer my query.


Yes I have found that grok for example actually suddenly becomes quite sane when you tell it to stop querying the internet And just rethink the conversation data and answer the question.

It's weird, it's like many agents are now in a phase of constantly getting more information and never just thinking with what they've got.


but isn't it what we wanted? we complained so much that LLM uses deprecated or outdated apis instead of current version because they relied so much on what they remembered


To be clear, what I mean is that grok will query 30 pages and then answer your question vaguely or wrongly and then ask for clarification of what it meant and then it goes and requeries everything again ... I can imagine why it might need to revisit pages etc and it might be a UI thing but it still feels like until you yell at it to stop searching for answers to summarise it doesn't activate it's "think with what you got" mode.

I guess we could call this gathering and then do your best conditional on what you found right now.


2010's: Google Search is making humans who constantly rely on it dumber

2020's: LLMs are making humans who constantly rely on them dumber

2026: Google Search is making LLMs who constantly rely on it dumber


Touché, that is what we humans are doing to some degree as well.


Sounds pretty human like! Always searching for a shortcut


It sounds like it's lying and making stuff up, something everybody seems to be okay with when using LLMs.


I am not sure why...you want the LLM to solve problems not come up with answers itself. It's allowed to use tools, precisely because it tends to make stuff up. In general, only if you're benchmarking LLMs you care about whether the LLM itself provided the answer or it used a tool. If you ask it to convert the notation of sheet music it might use a tool, and it's probably the right decision.


The shortcut is fine if it's a bog standard canonical arrangement of the piece. If it's a custom jazz rendition you composed with an odd key changes and and shifting time signatures, taking that shortcut is not going to yield the intended result. It's choosing the wrong tool to help which makes it unreliable for this task.


For structured outputs like that wouldn’t it be better to get the LLM to create a script to repeatably make the translation?


Honest question, how do you know if it's pulling from context vs from memory?

If I use Opus 4.6 with Extended Thinking (Web Search disabled, no books attached), it answers with 130 spells.


Exactly there was this study where they were trying to make LLM reproduce HP book word for word like giving first sentences and letting it cook.

Basically they managed with some tricks make 99% word for word - tricks were needed to bypass security measures that are there in place for exactly reason to stop people to retrieve training material.


This reminds me of https://en.wikipedia.org/wiki/Pierre_Menard,_Author_of_the_Q... :

> Borges's "review" describes Menard's efforts to go beyond a mere "translation" of Don Quixote by immersing himself so thoroughly in the work as to be able to actually "re-create" it, line for line, in the original 17th-century Spanish. Thus, Pierre Menard is often used to raise questions and discussion about the nature of authorship, appropriation, and interpretation.


Do you remember how to get around those tricks?


This is the paper: https://arxiv.org/abs/2601.02671

Grok and Deepmind IIRC didn’t require tricks.


This really makes me want to try something similar with content from my own website.

I shut it down a while ago because the number of bots overtake traffic. The site had quite a bit of human traffic (enough to bring in a few hundred bucks a month in ad revenue, and a few hundred more in subscription revenue), however, the AI scrapers really started ramping up and the only way I could realistically continue would be to pay a lot more for hosting/infrastructure.

I had put a ton of time into building out content...thousands of hours, only to have scrapers ignore robots, bypass cloudflare (they didn't have any AI products at the time), and overwhelm my measly infrastructure.

Even now, with the domain pointed at NOTHING, it gets almost 100,000 hits a month. There is NO SERVER on the other end. It is a dead link. The stats come from Cloudflare, where the domain name is hosted.

I'm curious if there are any lawyers who'd be willing to take someone like me on contingency for a large copyright lawsuit.


Can we help get your infra cost down to negligible? I'm thinking things like pre-generated static pages and CDNs. I won't assume you hadn't thought of this before, but I'd like to understand more where your non-trivial infra cost come from?


I would be tempted to try and optimise this as well. 100000 hits on an empty domain and ~200 dollars worth of bot traffic sounds wild. Are they using JS-enabled browsers or sim farms that download and re-download images and videos as well?


> only to have scrapers ignore robots, bypass cloudflare

Set the server to require cloudflares SSL client cert, so nobody can connect to it directly.

Then make sure every page is cacheable and your costs will drop to near zero instantly.

It's like 20 mins to set these things up.


a) As an outside observer, I would find such a lawsuit very interesting/valuable. But I guess the financial risk of taking on OpenAI or Anthropic is quite high.

b) If you don't want bots scraping your content and DDOSing you, there are self-hosted alternatives to Cloudflare. The simplest one that I found is https://github.com/splitbrain/botcheck - visitors just need to press a button and get a cookie that lets them through to the website. No proof-of-work or smart heuristics.


The new cloudflare products for blocking bots and AI scrapers might be worth a shot if you put so much work into the content.


Further, some low effort bots can be quickly handled with CF by blocking specific countries (e.g., Brazil and Russia, for one of my sites).


I work for a publisher that serves the Chinese market as a secondary market. Sucks that we can’t blanketly do this since we get hammered by Chinese bots daily. We also have an extremely old codebase (Drupal) which makes blanket caching difficult. Working to migrate from Cloudfront to Cloudflare at least


What's not clear from the study (at least skimming it) is if they always started the ball rolling with ground truth passages or if they chained outputs from the model until they got to the end of the book. I strongly suspect the latter would hopelessly corrupt relatively quickly.

It seems like this technique only works if you have a copy of the material to work off of, i.e. enter a ground truth passage, tell the model to continue it as long as it can, and then enter the next ground truth passage to continue in the next session.


Oh! That’s a huge caveat if that’s indeed the case.


One possible trick could be to search and replace them all with nonsense alternatives then see if it extracts those.


That might actually boost performance since attention pays attention to stuff that stands out. If I make a typo, the models often hyperfixate on it.


A fine instruction following task but if harry potter is in the weights of the neural net, it's going to mix some of the real ones with the alternates.


When I tried it without web search so only internal knowledge it missed ~15 spells.


What was your prompt?


There's lots of websites that list the spells. It's well documented. Could Claude simply be regurgitating knowledge from the web? Example:

https://harrypotter.fandom.com/wiki/List_of_spells


It didn't use web search. But for sure it has some internal knowledge already. It's not a perfect needle in the hay stack problem but gemini flash was much worse when I tested it last time.


If you want to really test this, search/replace the names with your own random ones and see if it lists those.

Otherwise, LLMs have most of the books memorised anyway: https://arstechnica.com/features/2025/06/study-metas-llama-3...


Being that it has the books memorized (huh, just learned another US/UK spelling quirk), I would suppose feeding it the books with altered spells would get you a confused mishmash of data in the context and data in the weights.


Couldn't you just ask the LLM which 50 (or 49) spells appear in the first four Harry Potter books without the data for comparison?


It's not going to be as consistent. It may get bored of listing them (you know how you can ask for many examples and get 10 in response?), or omit some minor ones for other reasons.

By replacing the names with something unique, you'll get much more certainty.


might not work well, but by navigating to a very harry potter dominant part of latent space by preconditioning on the books you make it more likely to get good results. An example would be taking a base model and prompting "what follows is the book 'X'" it may or may not regurgitate the book correctly. Give it a chunk of the first chapter and let it regurgitate from there and you tend to get fairly faithful recovery, especially for things on gutenberg.

So it might be there, by predcondiditioning latent space to the area of harry potter world, you make it so much more probable that the full spell list is regurgitated from online resources that were also read, while asking naive might get it sometimes, and sometimes not.

the books act like a hypnotic trigger, and may not represent a generalized skill. Hence why replacing with random words would help clarify. if you still get the origional spells, regurgitation confirmed, if it finds the spells, it could be doing what we think. An even better test would be to replace all spell references AND jumble chapters around. This way it cant even "know" where to "look" for the spell names from training.


btw it recalls 42 when i asked. (without web search)

full transcript: pastebin.com/sMcVkuwd


Not sure how they're being counted, but that adds up to 46 with the pair spells counted separately. But then nox is counted twice, so maybe 45.


No, because you don't know the magic spell (forgive me) of context that can be used to "unlock" that information if it's stored in the NN.

I mean, you can try, but it won't be a definitive answer as to whether that knowledge truly exists or doesn't exist as it is encoded into the NN. It could take a lot of context from the books themselves to get to it.


I think the OP was implying that it's probably already baked into its training data. No need to search the web for that.


This underestimates how much of the Internet is actually compressed into and is an integral part of the model's weights. Gemini 2.5 can recite the first Harry Potter book verbatim for over 75% of the book.


I'm getting astrology when I search for this. Any links on this?


Iirc it's not quite true. 75% of the book is more likely to appear than you would expect by chance if prompted with the prior tokens. This suggests that it has the book encoded in its weights, but you can't actually recover it by saying "recite harry potter for me".


Do you happen to know, is that because it can’t recite Harry Potter, or because it’s been instructed not to recite Harry Potter?


It's a matter of token likelihood... as a continuation, the rest of chapter one is highly likely to follow the first paragraph.

The full text of Chapter One is not the only/likeliest possible response to "recite chapter one of harry potter for me"


Instructed not to was my understanding.



I'm not sure what your knowledge level of the inner workings of LLMs is, but a model doesn't need search or even an internet connection to "know" the information if it's in its training dataset. In your example, it's almost guaranteed that the LLM isn't searching books - it's just referencing one of the hundreds of lists of those spells in it's training data.

This is the LLM's magic trick that has everyone fooled into thinking they're intelligent - it can very convincingly cosplay an intelligent being by parroting an intelligent being's output. This is equivalent to making a recording of Elvis, playing it back, and believing that Elvis is actually alive inside of the playback device. And let's face it, if a time traveler brought a modern music playback device back hundreds of years and showed it to everyone, they WOULD think that. Why? Because they have not become accustomed to the technology and have no concept of how it could work. The same is true of LLMs - the technology was thrust on society so quickly that there was no time for people to adjust and understand its inner workings, so most people think it's actually doing something akin to intelligence. The truth is it's just as far from intelligence your music playback device is from having Elvis inside of it.


>The truth is it's just as far from intelligence your music playback device is from having Elvis inside of it.

A music playback device's purpose is to allow you hear Elvis' voice. A good device does it well: you hear Elvis' voice (maybe with some imperfections). Whether a real Elvis is inside of it or not, doesn't matter - its purpose is fulfilled regardless. By your analogy, an LLM simply reproduces what an intelligent person would say on the matter. If it does its job more-less, it doesn't matter either, whether it's "truly intelligent" or not, its output is already useful. I think it's completely irrelevant in both cases to the question "how well does it do X?" If you think about it, 95% we know we learned from school/environment/parents, we didn't discover it ourselves via some kind of scientific method, we just parrot what other intelligent people said before us, mostly. Maybe human "intelligence" itself is 95% parroting/basic pattern matching from training data? (18 years of training during childhood!)


Do the same experiment in the Claude web UI. And explicitly turn web searches off. It got almost all of them for me over a couple of prompts. That stuff is already in its training data.


The only worthwhile version of this test involves previously unseen data that could not have been in the training set. Otherwise the results could be inaccurate to the point of harmful.


> But for sure it has some internal knowledge already.

Pretty sure the books had to be included in its training material in full text. It's one of the most popular book series ever created, of course they would train on it. So "some" is an understatement in this case.


Honestly? My advice would be to cook something custom up! You don't need to do all the text yourself. Maybe have AI spew out a bunch of text, or take obscure existing text and insert hidden phrases here or there.

Shoot, I'd even go so far as to write a script that takes in a bunch of text, reorganizes sentences, and outputs them in a random order with the secrets. Kind of like a "Where's Waldo?", but for text

Just a few casual thoughts.

I'm actually thinking about coming up with some interesting coding exercises that I can run across all models. I know we already have benchmarks, however some of the recent work I've done has really shown huge weak points in every model I've run them on.


Having AI spew it might suffer from the fact that the spew itself is influenced by AI's weights. I think your best bet would be to use a new human-authored work that was released after the model's context cutoff.


Hmm… maybe he could switch out all the spells names slightly different ones and see how that goes


What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.


There was a time when I put the EA-Nasir text into base64 and asked AI to convert it. Remarkably it identified the correct text but pulled the most popular translation of the text than the one I gave it.


Sucks that you got a really shitty response to your prompt. If I were you, the model provider would be receiving my complaint via clay tablet right away.


Imagine you ordered the new Claude Opus and instead you got Gemini telling you to glue the cheese on your pizza...


> What is this supposed to show exactly?

Nothing.

You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.

Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.


use AI to rewrite all the spells from all the books, then try to see if AI can detect the rewritten ones. This will ensure it's not pulling from it's trained data set.


Neat idea, but why should I use AI for a find and replace?

It feels like shooting a fly with a bazooka


it's like hiring someone to come pick up your trash from your house and put it on the curb.

it's fine if you're disabled


Bazooka guarantees the hit


I like LLMs, but guarantees in LLMs are... you know... not guaranteed ;)


I think that was the point


If all you have is a hammer.. ;)


do you know all the spells you're looking for from memory?


You could just, you know, Google the list.


and then the first thing you see will be at least one of ITS AI responses, whether you liked it or not


You're missing the point, it's only a testing excersize for the new model.


No, the point is that you can set up the testing exercise without using an LLM to do a simple find and replace.


Its a test. Like all tests, its more or less synthetic and focused on specific expected behavior. I am pretty far from llms now but this seems like a very good test to see how geniune this behavior actually is (or repeat it 10x with some scramble for going deeper).


This thread is about the find-and-replace, not the evaluation. Gambling on whether the first AI replaces the right spells just so the second one can try finding them is unnecessary when find-and-replace is faster, easier and works 100%.


... I'm not sure if you're trolling or if you missed the point again. The point is to test the contextual ability and correctness of the LLMs ability's to perform actions that would be hopefully guaranteed to not be in the training data.

It has nothing to do about the performance of the string replacement.

The initial "Find" is to see how well it performs actually find all the "spells" in this case, then to replace them. They using a separate context maybe, evaluate if the results are the same or are they skewed in favour of training data.


That won't help. The AI replacing them will probably miss the same ones as the AI finding them.


I think the question was if it will still find 49 out of 50 if they have been replaced.


To be fair, I don't think "Slugulus Eructo" (the name) is actually in the books. This is what's in my copy:

> The smug look on Malfoy’s face flickered.

> “No one asked your opinion, you filthy little Mudblood,” he spat.

> Harry knew at once that Malfoy had said something really bad because there was an instant uproar at his words. Flint had to dive in front of Malfoy to stop Fred and George jumping on him, Alicia shrieked, “How dare you!”, and Ron plunged his hand into his robes, pulled out his wand, yelling, “You’ll pay for that one, Malfoy!” and pointed it furiously under Flint’s arm at Malfoy’s face.

> A loud bang echoed around the stadium and a jet of green light shot out of the wrong end of Ron’s wand, hitting him in the stomach and sending him reeling backward onto the grass.

> “Ron! Ron! Are you all right?” squealed Hermione.

> Ron opened his mouth to speak, but no words came out. Instead he gave an almighty belch and several slugs dribbled out of his mouth onto his lap.


I have a vague recollection that it might come up named as such in Half-Blood Prince, written in Snape's old potions textbook?

In support of that hypothesis, the Fandom site lists it as “mentioned” in Half-Blood Prince, but it says nothing else and I'm traveling and don't have a copy to check, so not sure.


Hmm, I don't get a hit for "slugulus" or "eructo" (case insensitive) in any of the 7. Interestingly two mentions of "vomit" are in book 6, but neither in reference to to slugs (plenty of Slughorn of course!). Book 5 was the only other one a related hit came up:

> Ron nodded but did not speak. Harry was reminded forcibly of the time that Ron had accidentally put a slug-vomiting charm on himself. He looked just as pale and sweaty as he had done then, not to mention as reluctant to open his mouth.

There could be something with regional variants but I'm doubtful as the Fandom site uses LEGO Harry Potter: Years 1-4 as the citation of the spell instead of a book.

Maybe the real LLM is the universe and we're figuring this out for someone on Slacker News a level up!


Then it's fair that id didn't find it


Just did a similar experiment but outside the harry potter universe to remove the training bias. It worked well!

> ChatGPT: "Generate a two page short story like harry potter, but don´t mention anyting harry potter related. make up 4 unique spells in the story that are used"

Response see https://chatgpt.com/share/698af9cd-f628-800d-9250-b260f1478c...

> Claude: "What unique wizarding spells can you find in this story? [story]"

Response = https://i.imgur.com/Jzzs3PC.png


you can get the same result just asking opus/gpt, it is probably internalized knowledge from reddit or similar sites.


If you just ask it you don't get the same result. Around 13 spells were missing when I just prompted Opus 4.6 without the books as context.


because it is non-deterministic, i just did and got 50 spells.

prompt: finding every spell in all Harry Potter books in the first 4 books, only the officially documented spells across those 4 books, there are 50, without using websearch

Here's my best compilation from memory, organized by book. I'll be upfront: getting to exactly 50 is tricky without cross-referencing, so some might be debatable or I might miss a couple. Here goes:

---

*Book 1 - Philosopher's/Sorcerer's Stone*

1. *Lumos* - produces light 2. *Nox* - extinguishes Lumos 3. *Alohomora* - unlocks doors 4. *Wingardium Leviosa* - levitation 5. *Petrificus Totalus* - full body-bind 6. *Locomotor Mortis* - leg-locker curse 7. *Incendio* - produces fire 8. *Reparo* - mends broken objects

*Book 2 - Chamber of Secrets*

9. *Expelliarmus* - disarms opponent 10. *Rictusempra* - tickling charm 11. *Tarantallegra* - forces target to dance 12. *Serpensortia* - conjures a snake 13. *Finite Incantatem* - ends spell effects 14. *Obliviate* - erases memories 15. *Aparecium* - reveals invisible ink 16. *Immobulus* - freezes target in place 17. *Homorphus* - mentioned by Lockhart (werewolf cure claim) 18. *Peskipiksi Pesternomi* - Lockhart's failed pixie spell

*Book 3 - Prisoner of Azkaban*

19. *Expecto Patronum* - produces a Patronus 20. *Riddikulus* - repels a Boggart 21. *Waddiwasi* - shoots objects at target 22. *Impervius* - waterproofing charm 23. *Dissendium* - opens secret passage 24. *Ferula* - conjures bandages/splint 25. *Mobilicorpus* - moves a body 26. *Lumos Maxima* - intense light

*Book 4 - Goblet of Fire*

27. *Accio* - summoning charm 28. *Avada Kedavra* - killing curse 29. *Crucio* - Cruciatus curse (torture) 30. *Imperio* - Imperius curse (control) 31. *Stupefy* - stunning spell 32. *Engorgio* - enlarges target 33. *Reducio* - shrinks target 34. *Sonorus* - amplifies voice 35. *Quietus* - reverses Sonorus 36. *Morsmordre* - conjures the Dark Mark 37. *Priori Incantatem* - reveals last spell cast 38. *Deletrius* - erases magical residue 39. *Densaugeo* - enlarges teeth 40. *Furnunculus* - causes boils 41. *Impedimenta* - slows/stops target 42. *Reducto* - blasts solid objects 43. *Diffindo* - severing charm 44. *Relashio* - releases sparks/grip 45. *Orchideous* - conjures flowers 46. *Avis* - conjures birds 47. *Point Me* - Four-Point Spell (compass) 48. *Ennervate* - revives stunned person 49. *Protego* - shield charm 50. *Conjunctivitis Curse* - affects eyesight (Krum on the dragon)

---

A few caveats: some of these (like Lumos Maxima, Homorphus, Peskipiksi Pesternomi) are borderline since they're either mentioned rather than properly cast, or might be film additions that bleed into memory. The Conjunctivitis Curse is described but its incantation isn't explicitly given in the text. And Protego might technically first appear with its incantation in Book 5 during DA practice rather than Book 4.

If you want, I can turn this into a spreadsheet or document with columns for spell name, effect, who casts it, and which chapter.


The top comment is about finding basterized latin words from childrens books. The future is here.


I'll have some of that coffee too, this is quite a sad time we're living where this is a proper use of our limited resources.


> basterized

And yet, it's still somewhat better than the Hacker News comment using bastardized English words.


What are we testing here?

It feels like a very odd test because it's such an unreasonable way to answer this with an LLM. Nothing about the task requires more than a very localized understanding. It's not like a codebase or corporate documentation, where there's a lot of interconectedness and context that's important. It also doesn't seem to poke at the gap between human and AI intelligence.

Why are people excited? What am I missing?


There's a benchmark which works similarly but they ask harder questions, also based on books https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...

I guess they have to add more questions as these context windows get bigger.


I recently got junie to code me up an MCP for accessing my calibre library. https://www.npmjs.com/package/access-calibre

My standard test for that was "Who ends up with Bilbo's buttons?"


have another LLM (gemini, chatgpt) make up 50 new spells. insert those and test and maybe report here :)


Have you by any chance tried this with GPT 4.1 too (also 1M context)?


I often wonder how much of the Harry Potter books were used in the training. How long before some LLM is able to regurgitate full HP books without access to the internet?


Now edit the books and replace all spell names with different ones, and try again

By work I meant with

Now try it without giving it the books as context. I'm sure it probably knows there are 49.


The books were likely in the training data, I don't know that it's that impressive.


now thx to this post (and the infra provider inclination to appeal to hacker news) we will never know if the model actually discovered the 50 spells or memorized it. Since it will be trained on this. :( But what can you do, this is interesting


That doesn't seem a super useful test for a model that's optimized for programming?


I love the fun metric.

My hope is that locally run models can pass this test in the next year or two!


Assuming this experiment involved isolating the LLM from its training set?


Of course it didn't. Not sure you really can do that - LLMs are a collection of weights from the training set, take away the training set and they don't really exist. You'd have to train one from scratch excluding these books and all excerpts and articles about them somehow, which would be very expensive and I'm pretty sure the OP didn't do that.

So the test seems like a nonsensical test to me.


> All 7 books come to ~1.75M tokens

How do you know? Each word is one token?


You can download the books and run them through a tokenizer. I did that half a year ago and got ~2M.


Ah and no one thrown TOAC in it yet?


You need to publish this tbh


you are rich


Comparison to another model?


There are websites with the spells listed... which makes this a search problem. Why is an LLM used here?


It's just a benchmark test excersize.


> Just tested the new Opus 4.6 (1M context) on a fun needle-in-a-haystack challenge: finding every spell in all Harry Potter books.

Clearly a very useful, grounded and helpful everyday use case of LLMs. I guess in the absence of real-world use cases, we'll have to do AI boosting with such "impressive" feats.

Btw - a well crafted regex could have achieved the same (pointless) result with ~0.0000005% of resources the LLM machine used.


This is the moment for Europe to show that you can do gov and business differently. If they get their s** together and actually present a viable alternative.


They are doing it differently alright.

https://en.wikipedia.org/wiki/Chat_control


You're saying a proposed bill which hasn't passed is comparative to recent events in the US or am I reading too much between the lines?


You're saying EU is any different to USA?

Palantir clients: Europol, Danish POL-INTEL, NHS UK, UK Ministry of Defence, German Police (states), NATO, Ukraine, ASML, Siemens, Airbus, Credit Suisse, UBS, BP, Merck, ...

https://en.wikipedia.org/wiki/Palantir_Technologies#Customer...

https://www.palantir.com/partners/international/


Nitpicking, many on your list are not part of the EU : NHS UK, UK Ministry of Defence, NATO, Ukraine, UBS, BP.

Plus, the EU is 27 countries, out of which 5 are listed on their wiki page, with various institutions.


Europe can't do business differently. Or at least it doesn't seem to be able to. China can.


Last I checked millions of europeans are living in a functioning civilization. I've lived in Europe. It is ok.

Don't confuse "GDP not as big as ours" with "totally non-functional."


I didn't say it was totally nonfunctional, I said they can't do business differently than they are currently doing.


China can

Yes, things are different in totalitarian states.


"China can, due to mandatory 996 work hours"


How is it not viable now?


What would you like to see changed in the EU?


"Best I can do is Chat Control 3.0"


Is that actually possible? Can we do a live test here?

Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town

How much do I have to pay you to get it?


i think it could be feasible to get an ad in front of "35-year-old dentists living on the 400 block of Elm street in local town" who has bought product X but i've never seen a transaction by transaction purchase history being for sale.


How much you got?

Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason


Ok, so nobody here knows directly of any case where such data has been purchased, or vaguely similar, and we have no pricing information whatsoever available, but we are somehow completely knowledgeable about it being possible and how to do it? That sounds unlikely.


The supposedly in-the-know responses here are full of bravado but not much other than "trust me, bro"


https://news.ycombinator.com/item?id=44565878

Yea, you know everything, don't you.


Wow the Transunion business site, that really proves it huh.


Experian is known to sell the data they have. Why is this even in question? If I provide you Experian's website, you would give the same BS response?

Let me google this for you...

https://duckduckgo.com/?q=how+to+buy+data+from+a+data+broker...


The conversation was for buying transaction data from specific people, something that many seem to insist is easy and cheap and doable. Meanwhile if you actually read the responses to that search you smugly cited you'll find that no one seems to know how to actually do anything remotely like this. Yes this data is definitely harvested and it seems like you should be able to buy it in bulk from someone somewhere, but again no one seems to know where or how much or what the purchase minimum would be etc.


You asked for an example, one was given. if you’re saying you dont know how to send an email to a business page with the products purporting services described here - no comment in this section can help you


Yeah people fail to provide examples but continue to be doomers about how easy it is.


Been busy, but since you seem to be unable to find any body by searching on your own for the past 6 hours, here's something I found with a quick little search.

https://datarade.ai/data-categories/food-grocery-transaction...

Have we really lost the ability to use search functionality??


Of course people do. 5 seconds spent doing the most sparse-ass research will help you find plenty of stuff. If people don't respond, I imagine, for fear of 1) outing the specific area they work in, or 2) realizing these kinds of comments aren't generally acting in good faith so it is generally a complete waste of time.

I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.

https://www.transunion.com/business

Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.

There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.


I didn’t ask for a link to a company that can do it. I want pricing. I am saying that nobody here is willing to share anything even approaching specific pricing, which makes me very much doubt that any of them have the direct transaction experience they are claiming. I don’t doubt that underwater welding exists, but I do doubt that anyone in this thread has done it, or has any direct experience with it.


>There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences.

Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.


Credit card companies are known to sell data. https://www.cbsnews.com/news/mastercard-credit-card-customer...

Mobile service providers are known to have sold data. https://www.fcc.gov/document/fcc-fines-largest-wireless-carr...

Auto makers are known to sell data. https://www.caranddriver.com/news/a61711288/automakers-sold-...

You act like it doesn't happen, yet time and time again we learn about companies selling whatever data they can collect.

I can't believe we are still questioning this fact

What else do you need to know?


I think you misunderstand. I'm not doubting that it happens widely and pervasively. It's evident that this is the case. I just requested examples based on some of the very specific claims made here despite many ambiguities in how they were phrased.

Anyhow, thanks for taking the time to include some links.


For the most part, readers here are against it. Just because someone doesn’t know how to do it does not mean it is not doable. If it were not doable, these companies would not exist. I’ve already spent more time than I care on the topic. So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know. You can only lead hostess to water, but you can’t make it drink.


Hostess? You mean horses no?

>So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know.

As I very clearly said above, I don't doubt it at all, I was just asking for any clarification on who to whom.


and you were given them. so why keep taking this persistent obstinate line of questioning and persistent downvoting? it’s transparent and tired. industry experts chime in on this stuff all the time, it isnt anything done in backrooms or anything and is in the open. the only barrier to you not knowing is your own ignorance.


Persistent questioning (usually a good thing for debate by the way. I wasn't asking anything unreasonable), because I was curious and hoping for some answers from anyone willing to provide concrete details for claims that struck me as ambiguous on hard details, even though I don't doubt the widespread existence of personal data selling at all, as I've repeatedly mentioned.. As for downvoting, not sure what you mean. I can only say I've never once downvoted a single comment in any time spent commenting on this site, ever. I think the whole downvote thing is detestable and infantile. I'll even upvote things I disagree with just to counter it if I see them go grey.


Literally all anyone is asking for is one single concrete example of a site where you can roll up and buy personal information.


But what type of range are we talking? Tens, hundreds, thousands?


It could also mean that if you have to ask... or the first rule of data brokering...

Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games

Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house


If you need John Oliver to do it maybe it's not such a big problem? If no one here is able to provide a single concrete example, maybe it's not real?


John Oliver likes to spend HBO's money to do things others can't do while entertaining the rest of us. I'm not spending my money on something to prove what is known as possible for you. At this point, even with receipts, you're coming across as someone that would argue that grass is not green, or water isn't wet, and fire isn't hot.

Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.


All of this started because in response to an extremely concrete question, what's the cost of transaction data for a tightly constrained population, you replied with a smug non-answer about the greed of salespeople. These questions only got "belligerent" because every single answer has been nonsense insisting that it's super easy and cheap but also I couldn't possibly name a single site where this data is sold or provide even an order of magnitude of cost. Or maybe now it requires HBO levels of funding, who knows.


I offered sage advice on how to negotiate when you don’t know a firm price on anything whether that be data or a car or a home remodeling. If you want to say that advice was a smug answer then that’s on you. Every answer after has just gone further and further off the rails


Nah there's no way you actually watch John Oliver because that was really funny. Anyways, you mentioned earlier that we wouldn't believe you even if you posted receipts but that's actually exactly what we want to see. Like, just the name of a business, the thing that was sold, and the price.


In many fields there is no moat. It’s an execution battle and it comes down to question: can the startup innovate faster and get to the customers or can the incumbent defend its existing distribution well enough.

Microsoft owns GitHub and VSCode yet cursor was able to out execute them. Legora is moving very quickly in the legal space. Not clear yet who will win.


> Microsoft owns GitHub and VSCode yet cursor was able to out execute them

Really? My startup is under 30 people. We develop in the open (source available) and are extremely willing to try new process or tooling if it'll gain us an edge -- but we're also subject to SOC2.

Our own evaluation was Cursor et all isn't worth the headache of the compliance paperwork. Copilot + VSCode is playing rapid catch-up and is a far easier "yes".

How large is the intersection of companies who a) believe Cursor has a substantive edge in capability, and b) have willingness to send Cursor their code (and go through the headaches of various vendor reviews and declarations)?


Windsurf was acquired for $3B by OAI and it's clearly the worse of the two. Cursor is trying to raise at a $10B valuation and has $300MM in ARR in less than two years.

So in short, yes, companies do appear to be showing some willingness to send Cursor their code, even with all the headache associated with getting a new vendor.


Could you elaborate on the 'headaches of various vendor reviews and declarations'? (I thought paperwork etc was table stakes for a SOC 2 certificate)


A business is nothing more than defined, established, repeatable systems and processes. That is the difficulty with tooling businesses.


It's better than perplexity on certain tasks! Pretty impressive!

Feedback: the mermaid charts are a bit annoying when they are not needed

Our of curiosity, what web index do you guys use right now and does it make sense to build your own index at some point?


What object detection model do you use?


Is tesseract even ML based? Oh, this piece of software is more than 19 years old, perhaps there are other ways to do good, cheap OCR now. Does Gemini have an OCR library, internally? For other LLMs, I had the feeling that the LLM scripts a few lines of python to do the actual heavy lifting with a common OCR framework.


Custom trained yolo v8. I've moved on since then and the work was done in 2023. You'd get better results for much less today.


Quetzal as a product is inevitable. It will make it even easier to target international markets from day one with a tiny team. Congrats on the launch! You rock! Greetings from section 4D!


What's your next step? Why did you decide to focus on enzyme design?


We think enzymes are super cool! You can build molecular assembly lines at the atomic scale with them. Many pharmaceuticals are already manufactured with enzymes such as the diabetes drug Januvia. Engineering them is a big bottleneck though - takes years and millions of dollars. We want to speed this up with AI-powered design. Next step is ligand-protein prediction capability of AlphaFold3, which is also super useful for modelling enzyme-substrate interactions.


Possibly because it dovetails with pharma mfg and [potentially] food mfg. Could see a case made for enzymatically brewed 'meat inks' [very sorry for this term ;p] for 3d printing the next gen of lab meats.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: