You say that if someone has the chops to be a real mathlete they won't need Polya's _How To Solve It_
I'll say I went to college 25 years ago with people who had competed internationally in high school and who placed competitively on the Putnam, and they LOVED Polya's book.
I think whether you enjoy seeing strategies laid out well--—whether or not you've been able to figure some of it out yourself---depsnds more on your personality than on how good you are at solving creative math problems.
Unfortunately I did this for work so I can't open-source the code without some awkward conversations (which may be possible and worth it, but I haven't yet). I'm sure I could write a blog post on it, but I don't have a blog, so ... sorry. Peter Norvig's post plus my point about tries and an efficient comparison algorithm at least give you a head start.
Norvig wrote a similar expansion in his chapter for the book Beautiful Data (along with several other small programs to do fun things with natural language corpus data).
You can find it with a web search.
(His version didn't use a trie, because Python's built-in dicts are much more efficient than a trie in Python, even with the extra redundancy that a trie eliminates.)
It is built into Safari (via the History interface). Opera had this feature before it became a Chrome reskin. Chrome used to support it, but it only worked on http (not https) sites and after some years the feature was dropped. Chrome addons like Falcon bring it back (but Falcon seems unattended these days). The Min Browser offers full-text search history out of the box, but the browser experience is ... eccentric.
There was a SaaS service for this called Recawl which required a browser plugin. memex.garden offered this as part of a SaaS but they dropped this feature. Browserparrot focuses on this, again as a SaaS, again with uncertain pricing. Diskernet does this fully-local, but the software is not free (and is only offered via subscription pricing). St. Clair Software's HistoryHound does full-text history search, but only on Mac; I suppose it's got a bigger featureset than the Safari tool, and it supports not-Safari browsers.
The field is littered with previous attempts to get this right.
Depends on required features like cross browser support, cross device support, handling pdfs, ocr images, etc. Some of the mentioned features already exist in the browser. Not sure if the browser vendors are incentivized to develop and maintain such features.
If it were easy/cheap enough to host, can a model like shared game servers or web/email hosting work? People pay $20 a month without thinking for web hosting. What does it take to make "search hosting" a thing, where cheap search hosting companies can crop up both at the low end with bare bones offerings and others climb up the value chain with offerings like squarespace...
Cheap is probably a bit out of reach. I think right now, for hosting something like my search engine, you're looking at either a one-time cost of around $5000, or $200/month in server rentals. Maybe you can bring that down with the economies of scale, but it's never going to be anywhere close to $20.
> To clarify I'm not asking about HN itself but articles linked from HN.
I might not have a clear picture of what you're looking for, but items of type "story" returned by the HN API do have a URL field, which I believe correspond to submitted links.
You can scrape the text field of comment items, but that takes a bit more work.
Hopefull this will help: you're talking about a submission to HN, e.g. a link to a WSJ article complete with comments section, and OP is talking about the specific WSJ article.
Use IA more responsibly, perhaps. Instead of scraping it, convert the list of links from HN to point to IA? You still have to work with whatever limits the site puts up in any case.