Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I read an article where a software engineer was about to go on a long plane flight, so he downloaded a file of all english words and tried to write a spell checker during his flight, with no internet access, just as an exercise. And when he landed, he had finished, and his algorithm was beautiful, simple, elegant, easy to understand, and never something I'd have come up with in a million years.

It was actually a little depressing - it made me realize that some people just have a "something" that I'm not going to get through hard work or whatnot.

*edit: ah, found it. https://norvig.com/spell-correct.html



To be fair, Peter Norvig [0] isn't just another software engineer, he's a hardcore CS researcher, co-authored the most popular (pre-DL) AI textbook, was head of NASA's Computational Sciences Division, etc.

[0] https://en.wikipedia.org/wiki/Peter_Norvig


I think what they're saying is that some people are Peter Norvig and got lucky, and other people won't get there with any amount of study or practice. He's definitely not an average engineer, and as a result most people shouldn't expect to have the same level of ability

I had the same thought - read the URL and then said "Oh well that's Peter Norvig he's just weird in a cool way"


I’m conflicted on calling him “lucky”, and I don’t think you mean to discredit him, but yeah he’s had genetic and societal gifts but I’m sure his work ethic is what makes him exceptional


> I’m conflicted on calling him “lucky”, and I don’t think you mean to discredit him,

I don't think saying he was "lucky" is discrediting him - simply because there's many people who got 'lucky' but didn't push themselves to fully capitalize on it the way he did.

Basically he's both "lucky" but also extremely driven and uses it to his fullest potential.


> Basically he's both "lucky" but also extremely driven and uses it to his fullest potential.

Pretty typical for people who are at that level in their field. Terrence Tao and Michael Jordan both immediately come to mind.


For a measured response - yes I agree, it's a combo of luck and effort, but the effort can't compensate for the luck.

For my true response that probably isn't as popular - he may have an amazing work ethic but that's also just coming from how he was raised, so it's still luck, the same person born in a different environment wouldn't have done the same things - both nature and nurture are external forces


Geez, I've been told I have an unfair advantage because I have a predilection for working hard.

At what point should people stop making excuses and claiming to be a victim?


It's funny how you say you have a natural ability to do great things while simultaneously saying that people should stop making excuses and just gain your natural ability.

And I'm not someone who is a stranger to hard work; most of my loved ones have complained with great sadness for years that I work way too hard.

In my experience, most of us hard workers didn't spend years training ourselves to work hard; we just wake up, work hard, and wonder how the hell the past decade of our lives just slipped through our fingers without any meaningful memories besides work.


LOL, my dad told me many times that I was not afraid of hard work, as I would lie down and go to sleep next to it.


It’s amazing how hard the majority of poor people work. If effort was all that counted then the slums of the world would be overflowing with billionaires.


> It’s amazing how hard the majority of poor people work.

I've pontificated over and over here that it isn't really about working hard, it is about working smart. I.e. choosing the right things to work on. Digging a hole and filling it in again is hard work, but it isn't going to get anyone anywhere.

However, it does take some effort. Upgrading your job skills is a common way to work smarter.


So it comes down to being smart again, which is largely outside of your own conscious control. That means we're back to being lucky again, no?


How much smarts does it take to realize that improving ones' job skills is a way to improve one's life?


What exactly do you mean with "improving one's job skills"? What can a janitor do to improve their job skills? Learn to code in their free time? I have no idea what exactly you're suggesting.


You could just admit the just-world hypothesis is false and enjoy your good fortune. That would be the gracious thing to do.

But yes, seething on message boards is another common response to finding out you're part of luckiest 0.01% of people in the history of mankind.


What is the point of this? Who are these people who are making excuses and claiming to be victims -- and victims of what?

If there is a grievance you're making, can you elaborate on that?


I certainly wouldn't phrase it that way, but the best software developer I've met was not only very smart but really passionate about software. Like, _really_ passionate. His entire online persona revolves around it, and in person you could surely talk to him about other topics, but you could tell software stuff was his preferred one.

So he worked really hard, but he was _enjoying_ it to the point it wasn't really a chore to him. Others who are not so passionate about stuff can and will work hard towards some goal, but it will absolutely be a chore to them.

So I think the guy I'm talking about is lucky, or gifted, or whatever you want to call it, compared to others.

On the other hand, at some point I guess it becomes an addiction? For example I, like most, enjoy sex, and having sex a few times per week is great and there's nothing wrong with that, but if you _must_ have it 5 times per day maybe you have a problem.


I wonder sometimes what effect all this has on young smart kids.

Image two kids who show an interest in science or engineering and show some talent in those areas. One is a white kid from a well off community, and one is a Black kid from a less well off area.

The Black kid is going to be told several things, all of which are true. There is significant systematic racism in the US and the kid will find it affecting him. His school won't be as good as those in better off areas. His family won't be able to afford extracurricular activities that would help make up for the poor school.

If people are not careful in how they tell the kid those things, it would be easy for the kid to get the idea that the odds are too stacked against him and give up on pursuing science or engineering.

The white kid, on the other hand, might have his accomplishments partly or wholly dismissed as due to his better school, his family having more money, and his not having to deal with racism. That too could easily convince a kid to not bother. If everything you do is dismissed as just the inevitable outcome of things you were given and not as something you accomplished, then screw it...why not just spend your time playing games or on social media or partying?


Personality psychometrics are well established and more scientific than people realise. You likely have a high nature and nurtue tilt towards trait Conscienciousness (OCEAN model), which is largely in part due to your experience of life before the age of six


I have no problem working hard. I can grind for hours on a problem when I’m engaged with it.

My problem is constant distraction. It’s really bad. And blocking out everything else doesn’t help. I even get distracted by the thing I’m working on. I’ll see something new and go off and research that.

From everything I gather, we’re living in a pandemic of distraction. I think people had a lot easier time focusing on their tasks before the internet came along.


Wouldn’t worry about it much.

If they want to claim that then let them, doesn’t hurt you or me.

There are enough legitimate victims around, we don’t have a shortage alas.


At all points.

A predilection for making excuses is a disadvantage so you should excise it from your character/personality.


what do you care what people say? let them make excuses, what's it to you?


It matters where you set the achievement bar. Sure, if the goal is "internationally recognized expert," perhaps luck is a stronger factor.

But if the goal is, "strong enough programmer to write a spell-check algorithm from first principles," then work-effort might very well compensate for a lack of advantages in upbringing or other "luck-factors."


Yeah, if you completely ignore people who are born in destitute and corrupt countries. How is someone supposed to gain access to a computer when they can't even feed themselves?


And that’s not even to mention how lucky he is to not have been born 100 years ago!

The point here is that going to such extremes isn’t really interesting.


They are pretty interesting because it establishes limits from which you can work to understand the validity of your model (idk about you but that's how I was trained to create models in my physics degree).

I mean, from there you can move on to similar cases where bad luck trumps any amount of ability. For example, what sucks about being impoverished? Not having good access to a computer, right? From there it's not hard to draw on other things (e.g. abuse, bad culture, poverty) that make having good computer access a huge challenge. It becomes pretty apparent that luck plays a huge factor in success.

Pretty interesting to me.


It’s obvious, but uninteresting to consider the extent to which luck factors into success.


So obvious that a lot of successful people here think they did it all on their own with very little luck involved. So obvious that there is an entire political party whose foundation is built around, "if I could do it, so can you."


As Stephen Jay Gould said: "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops."


I don't think his work ethic comes from his upbringing - our parents didn't push us that hard and my work ethic is average. It may come from the fact that he found what he loves to do early in his career, it may come from my Dad's genes (his work ethic was pretty good), or a combo of both


If he is halve the genes given to him, halve what his environment has given to him, does he even exist? Do any of us exist?


Only if you conflate the meaning of your own existence with the achievements you make.


we exist as water in a river


IMHO the proliferation of "lucky" does more harm than good. It's factually correct but bad policy to emphasize it if you want the most people to avoid learned helplessness.

Some might be motivated knowing the only thing that separates them from greatness is luck, but more often I saw students feel defeated.


It's factually correct that free will is an illusion, but it's best not to say it out loud because people might start acting like it.


It is not factually correct that free will is an illusion. It is just a difficult subject to discuss.

Why difficult?

Discussion that is considered rational culturally descends from argument techniques rooted in Aristolean ideas about logic. However, as shown by Kurt Gödel formal systems break down under self-reference. Through Turing we see can see this type of thing starting to happen when two Turing Machines need to predict the output of each other.

Through John Von Nuemann and Nash we can get a proof of non-determistic policy functions in multi-agent decision problems being optimal. So we find an analogous concept to free will and we find it justified in this abstract domain not on the basis of some wishy washy concept, but on the basis of analogical reasoning (logic) and causal outcome modeling (dynamic functions).

So what do we reject to claim it is an illusion? Logic? Or causality?


It’s often difficult to discuss free will, among other reasons, because it’s ultimately a philosophical concern and people today tend to lack philosophical literacy—they unquestionably adopt whatever belief is a nail to the hammer of their education and upbringing: the STEM crowd here will by default go with physicalist monism; those with more religious upbringing will adopt dualism; and in the end all will be easily aggravated by questioning how they got there and asserting that alternatives are equally possible (and in case of STEM perfectly compatible with natural sciences, being entirely outside their scope through Gödel’s incompleteness if nothing else).


I think philosophical literacy is probably higher now than at any point in the past. For one, a greater deal of the philosophical writing is now existent than was in the past. For another, the selection effects have weeded out much of the writing that was bad. For yet another, a far greater percentage of the population is literate than in the past. For yet another, the population is healthier, wealthier, and generally better able to apply themselves to the material than they were in the past. For yet another, being after rather than before the Enlightenment, questioning of assumption is actually a much more prevalent reflex now than it was in the past. For yet another, our information retrieval systems help us do a better job of organizing the and discovering the relevant writing.

I also think philosophical literacy is less relevant than it was in the past. For example, we are talking about how an agent's decision making algorithm is in actuality. Is philosophy the best subject to discuss this? Control theory, learning theory, game theory, scientific investigation into the brain's processes are all mature enough that their usage produces more rewarding outcomes.

So lets say we see what at first glance appears to be degenerate twitch chat zoomer spam about philosophy sus no cap? Are they really philosophically unsophisticated? Or were they raised in a world of such greater philosophical sophistication that self-replicating knowledge structures - like, say, memes, were something they had extreme and constant exposure to? What if they are engaging in some sort of coordinated omegalul gambit?

I'm joking, but I'm kind of serious.

https://xkcd.com/603/

I often find that explaining observed incompetence with genius works better than explaining observed incompetence with incompetence. So I'm more fond of claiming a problem is hard for good reasons then that people just weren't educated - which is also often true, by the way, just explaining why I ended up putting the emphasis somewhere else.


> For example, we are talking about how an agent's decision making algorithm is in actuality.

What is “agent”? If a deterministic universe without free will is considered, whatever “agent” means is likely nonsensical so if you are discussing whether or not the universe is deterministic then presuming existence of an agent (with agency) is a mistake. So no, in a discussion on whether the universe is deterministic we would not be talking about that, whatever that is, if apparently presumes existence of an agent.

> Are they really philosophically unsophisticated? Or were they raised in a world of such greater philosophical sophistication that self-replicating knowledge structures - like, say, memes, were something they had extreme and constant exposure to? What if they are engaging in some sort of coordinated omegalul gambit?

I didn’t mean just newer generations by “people today”. Discussing this with older people is as difficult and they tend to equally just make assumptions instead of reasoning. I like discussing topics like these the most with a gen Z philosophy professor a few years younger than me.


Not arguing nondeterministic universes. Within a universe agents see computationally irreducible phenomenon. It is just game theory decision problems embedded in cellular automata. The agent which says they could have decided another way is correct - their modeling of the problem did in fact have that property. The agents which thinks this insane and improper modeling are suffering from a hindsight fallacy.


In case that wasn't enough:

Try considering a Turing Machine which is running several Turing Machines within it. Call it A. Within it are B and C. B gets an input state that is the entire Turing Machine. Meanwhile C gets the output of B. There are multiple B and multiple BC. So we get something like this when we write it out:

A = B + B + B + BC + BC + BC

You might be tempted to say that A = A, therefore A is deterministic. This would even be logically true. However, it is actually far more dangerous than it appears. Why? Well, you aren't A. You are within A. Lets say you are B within A. Does it matter that you know the dynamics function? It is deterministic, but does it follow from it being deterministic that it is deterministic? No! Even though B is within the deterministic system A, B cannot claim that A is deterministic, because B cannot claim that C is deterministic, because C is indeterminable by B through self-reference.

Now the standard mistake is to let B run then let C run then to pretend B could determine C, because now obviously we can tell that A was determined by B because C was determined by B. Which sounds really compelling, but ask yourself this to see why it isn't as great as you think it is: is the universe still subject to a deterministic dynamics function right now? If so, can you tell me what C is, not in the toy problem, but in the real world?

Like, I realize this is an impossible problem: I'm asking you for the current configuration over the course of the next second for all matter in our universe even the parts you can't model because you haven't observed it. But that is kind of my point. The universe hasn't stopped running yet. You can't determine C from the information context of B.


Now lets say you try to counter, ah, but A so therefore all the other stuff is meaningless.

It seems like it works, but lets talk about the parable of the time you were given a deterministic dynamics function and you computed it according to what it was because you wanted to claim it was deterministic by treating it as what it was.

So you start calculating B + B + B + BC + BC + BC. And you see me looking at the same problem and idiotic like I am, you see me write down A = A' = BC. Then I get ready to solve it. And you are like, pfft, what an idiot. He isn't even talking about, like, actual reality.

unfortunately, in the calculations that follow, you realize something strange: Though you may B, I C you. And because I see you before you be such that I can determine what C should be such that you cannot see C while you B be, I choose a C for your B so that your B doesn't see C.

Basically, in choosing the slowest path, you also choose to be determinable, but in choosing being determinable, you implicitly make yourself vulnerable to an A' that makes your context - during the process - undecidable.

You thought choosing A saved you, but actually it was your curse:

You let me set A = B + B + B + A' because you didn't want to claim that we could divide A into the pieces which it was in actuality composed. But because it was in actuality composed of those pieces BC is now calculatable by A' such that your conceit is actually what traps you in the non-deterministic perspective. If B was calculated, not via A's conception of B, but A' then now it isn't just B that can't predict C. C probably can't predict B either.


So lets escape this! Lets say the process ends! Now B has fully determined itself such that C is determined and now A is determined as well.

Is A now deterministic? No, A isn't anything anymore.

Nothing is happening.

A isn't now deterministic; A was determined and now our physics is stopped. Where is the determinism at? Not within A, because clearly during A B wasn't deterministic because of self-reference to C. Not afterward either, because now it isn't doing anything.


> Why difficult?

Because too many people have adverse emotional reactions to such discussions.

> Discussion that is considered rational culturally

Discussions about free will stop being rational very early.

> So we find an analogous concept to free will

Ah, this is where the problem starts.


> Ah, this is where the problem starts.

It really isn't a problem. The analogical congruence holds.

Typically free will is defined according to the action being taken without the constraint of necessity based on a person's own desire.

In the game theoretic model that Nash showed optimal the decision making act that the agent has to do isn't about deciding over actions. It is over strategies. That might be a bit confusing so lets just use an example.

If you play rock paper scissors in the agent's modeling of the problem it isn't modeling it as an action choice of rock, paper, or scissors. It is actually modeling it as an action choice over probability vectors. The action choice of rock corresponds with the probability vector of [1 0 0], paper with [0 1 0], and rock with [0 0 1]. There are an infinite number of different policies, but it turns out that the rational one to pick is [1/3 1/3 1/3] playing each option with equal probability.

Notice that here the agent is taking an action, but not one that is constrained by necessity. Notice that the choice of it is due to the modeling of what is in the preference of the agent. Compare that with the definition of free will. The same thing is happening.

So the congruence does hold, but what makes it appear to not hold is that most people rejecting free will neglect computationally irreducible phenomenon. In actuality, computationally irreducible functions which allow for stochastic signals show up in cellular automata without the requirements of dualism. Selection and variation are then much more then enough to show that surviving agents will protect these information sources so as to make them unobserved, because failure to do that isn't optimal and so agents which don't aren't selected for.

> Ah, this is where the problem starts.

We could try and reject, not the analogical congruence which holds, but the system of using analogical reasoning in the first place, but this goes badly. The first big problem is that all of our knowledge comes via theory-laden proxies and removing analogical validity removes the validity of evidence in general. A stranger result is that compression is justified through analogical congruence. So you can no longer claim to have knowledge over the state and dynamics function, because you have knowledge over the compressed form of it. To claim knowledge, you now need to physically be it.

> Ah, this is where the problem starts.

There are many invisible octopuses. Fish are going to encounter decision problems wherein these camouflaged predators are both on their right and left yet the decision context shows them the exact same thing in both cases. So which should they pick? Always left? Then they always die. Always right? Then they always die. The only winning solution is to pick both, because then sometimes the agent lives to reproduce. How do we pick both? Well, if we do it in a way that is computationally reducible then the octopus which observes the fish invisibly can anticipate it. So now it is wherever the fish decides to go. So it has to decide to do both, but it has to do so in a way that is unobservable. This decision problem doesn't go away when someone chants the magic word of dualism. The agent still needs to decide in an unpredictable manner or is going to die. So when optimization processes build things? They end up approximating a solution to this problem.

This type of decision problem has played out billions of times. It has played out over billions of years. I don't know which time it happened which was the first, but where it was, that is where the problem really starts. And since that problem and onward a filter has been killing the things that answer incorrectly.


> factually correct that free will is an illusion

No it's not, that is only your belief (or as I call it "delusion")


You make an interesting point, and I think which side of it is correct depends what the end goal is. It definitely doesn't help people, particularly children, to tell them they're "lucky" or "gifted" in a particular ability. Early in life it means that they can downplay the effort needed to develop that ability, and later in life when they hit the actually difficult bits it can result in them assuming they must just not be as good at it as everyone says, rather than applying themselves to understanding.

There is also value in people understanding that not everything is down to innate ability and application though. That road leads to people looking down on impoverished people because they should have just put more effort in. Get a better job, learn a skill, regardless of the fact their situation means they're already working 3 jobs just to make enough money to ensure their kids can eat tonight.

As with so many things people have polarised. You're either in Camp Skill & Dedication or Camp Luck & Circumstance. In reality it's always a bit of both, with luck giving some people an advantage when it comes to having the time and space to dedicate themselves to developing their skills.


> It's factually correct but bad policy to emphasize it if you want the most people to avoid learned helplessness.

i'll keep that in mind next time i'm making "policy."

i'm not clear how, in this conversation, policy came up.


You can be lucky and hardworking, and have both of these factors be individually necessary but insufficient contributors to your success.

People get weirdly defensive about luck being a major factor in their success, but it always is. It doesn't mean you didn't work hard, it doesn't mean that your decisions didn't contribute to your success. It means that, if you were born a little bit less smart, you would have achieved less. And if you were born in North Korea or medieval Europe in the peasant underclass, you would probably have achieved nothing of note and may have died of the plague before reaching adulthood.

I'd estimate that just the sheer luck to be born in a wealthy, free country without debilitating handicaps in the late 20th century or early 21st century already puts you in something like the top 1% lucky people in human history.


You can argue that being hardworking is a talent, and as such luck. Not everyone has the force of will to be hardworking some people are naturally lazy and some can put hard work without it feeling like hardwork and others can feel the hard work, but have the will to overcome it, etc.

I say that success is 100% luck and of that 100% of luck, some of it can be composed of hardwork, as the ability to perform hard work is in itself, a part of luck.


everything in your sentence can be traced back to luck in some way. Genetic gifts I'm sure contribute to work ethic, but also the families, societies, you're born into. The teacher you met in 1st grade that encouraged you to keep trying etc. Luck is in everything IMO


While of course the choice to apply effort and enact effort exists, but being able to successfully do so, and to have he psychological directive to do so effectively, is largely based on personality, which is largely established by age six. (Conscientiousness - OCEAN)


Inherent motivational energy is a roll of the genetic/environment die just as much as every other trait like height or engineering intelligence capacity.


Perhaps his epic work ethic is partially the result of a genetic trait and/or the influence of good parenting at an early age… which would be pretty lucky for him.


Luck is like 5% of any equation. He had the gift and he was disciplined enough to execute on it.

That 5% luck probably got him an extra 10-20% further than anyone else. However, in a room with equally disciplined individuals you wouldn't be able to discern the lucky from the hard working.


> He had the gift and he was disciplined enough to execute on it.

What if your excecute in the wrong proffeshion, like you become a stock trader but you could have been a world-class developer?

How does one even find out, most peoppe in the world neverbtry programming.

Maybe you have a gilf for something that younneverbhad a chance to discover.


the hardworking are lucky to have been born with that personality trait.


> but I’m sure his work ethic is what makes him exceptional

why?


My aspiration as a programmer and security engineer is to one day be regarded as "just weird in a cool way".

Until that day comes, I keep working at it.


That's also how I live my life haha, much less stress about people judging you if you can write it off as "yeah but I'm cool in other ways"


Ha! Love this. Bet you're already weird in a wonderful way :)


And he was probably one of the worlds foremost experts on this at the time given his position at google, haha.


I mean, he definitely knew about the concept of edit distance: https://en.wikipedia.org/wiki/Edit_distance before writing this code, which is the most difficult part. Here's a practice problem: https://leetcode.com/problems/edit-distance/.

So in this case, the special "something" he has is quite literally the amount of hard work he put in to learn this stuff. And if you take this opportunity to learn about edit distance, you'll be one step closer to that special someone you want to be.


maybe someday i'll finally be the person i want to be, and then i can really start to live.


Are you being sarcastic?


You know how there is that joke about reasoning about multi-dimensional structures?

"To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it."

– Geoffrey Hinton, A geometrical view of perceptrons

Well, you actually can come up with Norvig's spellchecker very easily with a similar trick. Just tell yourself: to deal with a really hard problem, just argmax with an okayish heuristic. Everyone does it.

Solutions like this seem impossibly elegant because they tolerate a whole lot of error, but they are actually extremely obvious when you tolerate error.


That quote is incredible, and now I know what coffee feels like leaving my nose


Did you visualize water leaving your nose, and then said loudly "coffee" ?


When you're not performing your duties do they keep you in a little box? Cells.


If it makes you feel any better the original is not the same as what is displayed now[0] so it did not take just one flight to achieve elegance, and also included some bugs which were mentioned in the errata later[1].

[0] - https://web.archive.org/web/20070410053746/http://norvig.com...

[1] - https://web.archive.org/web/20150906100448/http://www.norvig... : ctrl+f "Update"


It's really nice and I'm a fan of the man but... Even back then it was a really simple spellchecker. Back then stuff like the "metaphone" and "double metaphone" algorithms already existed (suggesting candidates based on similarly sounding words). Fun algorithms 'em metaphones I'd say. Relatively easy to implement/port (been there, done that).

I'm sure there's better stuff now.

Also it's possible to better ranks candidates to fix typos by taking into account the layout of the keyboard (e.g. on a QWERTY keyboard is you see "wery" it is, despite having an edit distance of 1, highly unlikely the person typing meant to type "very". "weary" is much more likely). So just sorting by edit distance and then suggesting the most common word often doesn't result in the best candidate being shown first.

And then context. Ah, context. I type this and my browser isn't aware I'm talking about algorithms and not only says "metaphone" is incorrect but suggests "megaphone".

So... The matter still isn't settled if you ask me.


Soundex is an earlier method.


That is a word I read once and then forgot but kept looking for! Thanks.


This kind of feat is the exact opposite of what the article is about.

This code is very simple, very short, very unoptimized and done in a few hours, free solo. The kind of code the article could have linked as an example of how simple it is now to write a spellchecker, in fact it explicitly talks about how you can do it in a few lines of Python, i.e. this.

What the article is about is complex, highly optimized code that took months to do, because computers at the time didn't even have enough space to store the complete word list uncompressed, neither in storage nor in memory, and even with compression, you had to be clever.


so a question that could be asked today is: what could be achieved, if only the same cleverness was applied to the utilization of the massive resources available today?

One potential answer is indeed the recent developments in AI (generative stuff like GPT & stable diffusion etc).


You're in for a bad time if you're going to be comparing yourself to Peter Norvig.


The prowess here is greatly overstated. His implementation would be pretty obvious for anyone that had read an Information Retrieval textbook any time in the couple decades between when these methods were devised and when he wrote that post.

That's kind of like watching someone use Red-Black Trees and assuming that they intuited them out of thin air. It's just remembering a basic algorithm that would be familiar to anyone in his particular field.


These ideas are extremely powerful. I built a spell-checker largely based on this article, by parsing English Wikipedia. At scale it needs a trie and a distance metric with better-scaling performance metric, but it works really well. These go together: your error metric is a priority-based multi-headed traversal of your trie -- this emulates comparing your word to every word in your dictionary; you can compare against a few million words very quickly.

Because it's based on Wikipedia, it can correct proper names too. It's very extensible: by changing the error model, you can get fuzzy auto-complete that can look at "Shwarz" and suggest "Schwarzenegger" (even though you missed the 'c'). You can extend it to looking at multiple words instead of just letters, for correcting common word-use issues as well.


I built a spell checker in the D compiler, where the dictionary consists of the symbols in scope. It is very satisfactory.


Is the code or expanded explanation available?


Unfortunately I did this for work so I can't open-source the code without some awkward conversations (which may be possible and worth it, but I haven't yet). I'm sure I could write a blog post on it, but I don't have a blog, so ... sorry. Peter Norvig's post plus my point about tries and an efficient comparison algorithm at least give you a head start.


Norvig wrote a similar expansion in his chapter for the book Beautiful Data (along with several other small programs to do fun things with natural language corpus data).

You can find it with a web search.

(His version didn't use a trie, because Python's built-in dicts are much more efficient than a trie in Python, even with the extra redundancy that a trie eliminates.)


Do you allow yourself to be bored at all?

I recently realised I've been missing my "something" for a while.. ya I've not had a dopamine free moment since doom scrolling became a thing.

Not comparing somethings, I probably couldn't write a spell checker much less a beautiful one, but my creative juices are much less clogged up since allowing myself to get bored.

I believe it's being able to combine a learned skillset with raw creativity (boredom) that gives a person that something you refer to.


For me it's largely "the joy of finding things out". If I know there's a standard solution to a problem, I'll avoid the "spoilers" because I want to see how far I can get on my own. (And yeah, sometimes I just get bored of doomscrolling... Building things is a much more satisfying way to spend my time!)

I often avoid things because I think they'll be hard, but then discover that they turn out to be easy. "Why didn't anyone tell me how easy this was! I could have done this ages ago!" was a running theme for me for a good while.

Of course, sometimes things are actually harder than you thought. But they're still doable! Whereas I have this cognitive bias where "difficult" is (emotionally) equivalent to "impossible", which is just really unfortunate. I'm sort of starting to see through that, that everything's made of atoms, and once you break a task down far enough, the individual steps are trivial.


At least in my career so far all of the tasks I've been told are "impossible" just need another 5 minutes of thought - to the point I'm wondering if people do it on purpose so that I take over the task haha


Worth noting he did not perform the "major feat of software engineering" like the OP says (given 256k ram or lower). He just put all the words into a hash. The main key is knowing about edit distance, if you know that, you can probably write the solution.

So yes he's very talented, but if you study enough computer science, you can also do it. Or don't. It's totally fine to be interested in other parts of engineering instead.


Don't get me wrong, Peter Norvig is awesome, but if you're already familiar with levenshtein or even hamming distances, it's pretty straightforward to hack together a spellchecker.


Reminds me of a dumb project I did that takes a person's name and returns a name that is close, but not quite right: https://github.com/rzimmerman/shitty-autoreply

The idea was to set up an auto-reply to my email that looked automated, but surely couldn't be because it messed up the sender's name. Not nearly as clever as Peter Norvig's program and even less useful.

I'd say the bias toward common American names is a bug, but given that the goal of the project is to be obnoxious it's probably a feature.


To be fair even Norvig’s implementation wasn’t great for a mobile device. It can be improved with a more efficient trie structure…


Do you have any links?


The hard part in the early spellcheckers wasn't figuring out that you want to find a set of words with a minimum edit distance from the input.

The hard part was fitting the dictionary in memory, and then searching all possible strings 1 and 2 edits away from the input to find the candidates.

The progress is that we don't have to care about storing 2.5MBytes (they ran on machines with 640k!) in RAM, or searching it 100,000 times (e.g. edits2('something')).


You do care about the 2.5 MB if memory is a concern and are limited by process where there is a very concrete threshold :)

Try fitting GIFs, Videos, images + spellchecker & autocorrection into that process


Somewhere in my email if my look back long enough haha… try searching for tightly packed tries on Google


It is a bit strange to me that it would be surprising that there are people way better at programming than you. The chance that you would be at the far right of the bell curve for any skill is pretty small, by definition. Almost everyone is going to not be the best at your thing. I don’t know why it would be depressing to not be the best.


> any skill is pretty small, by definition

I challenge this, especially the "by definition" part. It depends on how many skills you recognize.

The chance of being the best artist (to the extent that such a statement is reasonable) might be one in eight billion.

The chance of being the best automotive sharpie artist is, similarly, one in eight billion.

The chance of being the best artist /of some particular skill set/ is, then, much higher -- if you recognize only a few thousand distinct skills here, you're already at one in a million.

Then realize that those who are likely to be surprised that there are better programmers than them are /not/ likely to be surprised that there are better artists than them. So we're assessing across all skills.

The chance of being on the far right of the bell curve of a specific skill is pretty small. The chance of being on the far right of the bell curve for /any/ skill is quite high, with only eight billion people around.

Find what you do better than anyone else, embrace it, and enjoy it.


I used it to apply to a job at Twitch, as they tasked me on creating a spellchecker. Thanks, Peter! Also, they didn't give me an offer.


I've come to accept the fact that there are millions of people better than me at something. It's depressing and hopeless trying to be the best, there isn't even an objective measure. My goal is instead to be useful.


That "something" is basically passion and motivation, a child-like fascination with computers and the powers they can give you. People like this are so valuable and great to work with not because they have some innate power to understand algorithms, but because they sell a brand of optimism reminiscent of the "old" tech industry, the American postwar optimism of the 90s, where new things actually were revolutionary.

Let's be real, in 5-6 hours he designed a very nice, simple algorithm to produce likely words from a dictionary based on frequency analysis, and produced 30 lines of Python code implementing it. A lot of engineers today could have done this.

The "specialness" was that he did it on his own time and enjoyed it.

There was an xkcd relevant here about enjoying Python: https://xkcd.com/353/


Many developers would come up with that once they know the answer. Not as many will find the answer.


True genius is coming up with something so obvious everyone thinks "that's nothing, I could have done that."

Complexity is the product of lesser stars in the firmament.


The optimism of the 90s. That's a strange one I hadn't heard before.


I’d bet you have a “something” that isn’t obvious to you, or that people don’t compliment you on. And even if you don’t, comparing one’s self to anyone is sure to be depressing.


Peter Norvig is at god-tier, comparable perhaps to Torvalds, van Rossum, Wozniak, Stallman, ESR, Kernighan, Ritchie, Gates, Cormen, and many others I've left out.


Gates is definitely not 'god-tier' (I won't argue about any of the others, although I don't know who Cormen is)


Agreed


Yeah and that “something” is a deep interest.

You can do similar things in domains where you’re so interested you can’t think of doing anything else.


Left the punchline for the link huh? Peter Norvig was the "a software engineer" nuf said, Mike drop and all that.


If you realize that how hard you try you won’t able to do certain things, do ppl keep trying or accept the limitations ?


The guy making revenue is a better developer than the guy making elegant algorithms

Very few of us actually want to be the guy accumulating accolades for their engineering, we want to take flights to things we actually want to be doing, not opening our laptop once or even bringing it

Everything in between is a coping mechanism


You are a very big fish when measured against countless ponds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: