Imo, this post did not organize its data and findings into a coherent presentati...

ammon · on June 23, 2015

Hey! Author here. Yes, we did not know that the screening steps were meaningful, so for the first 300 applicants, we interviewed everyone, even people who performed badly. We then looked for correlation between screening step scores, and programming interview results. Doing well on the fizzbuzz problems was not very correlated.

By dropoff I mean people who left during a step, and never logged back into our site.

KingMob · on June 23, 2015

I appreciate that you guys might not be statisticians, but if you're going to try and analyze data like this, you simply must address survival bias. As it stands, these data are meaningless unless you assume dropouts are completely unrelated to your screening.

You claim doing well on the Fizzbuzz wasn't correlated with interview performance, but you also said "We saw twice the drop off rate on the coding problems as we saw on the quiz."

An alternate explanation for your finding then is, more low-quality candidates drop out of the process when given FizzBuzz, leaving a relatively homogeneous pool of higher-quality candidates for the later interview. This effectively reduces the ratio of meaningful interindividual differences relative to noise, which will reduce the correlations.

In all likelihood, both of your correlations are low, but the idea that coding is less predictive than a quiz could be purely a statistical fluke due to survivor bias.

ammon · on June 23, 2015

All candidates did both screens (quiz and fizzbuzz). The correlations were calculated against the same population. Now, I agree that survivor bias could affect the quality of these results (we know nothing about the significant % of people who dropped out). But it's not really possible to solve that problem outside of a lab. I don't think it's an argument to not do analysis. For now we're simply trying to minimize the dropoff rate, and maximize correlation. The quiz was better at both.

KingMob · on June 23, 2015

Well, the candidates doing both screens is better, but it doesn't totally solve your problems.

It doesn't address the survival bias issue, and when you say a significant percentage dropped out, that's not reassuring. But it's not the case that you need a lab to solve the problem. Even a basic questionnaire of programming ability self-assessment might tell you if there are meaningful differences in the population that quits your process. At the very least, you should understand and talk about survival bias in your article to indicate you're aware of the issue.

Even if you still want to claim a difference between the quiz and coding exercise, you're not yet in the clear. For example, did you counterbalance the order you gave them to people? E.g., if everybody did the quiz first and the fizzbuzz second, that meant they were mentally fresher for the quiz and slightly more tired for the fizzbuzz, which could again create a spurious result. And this definitely doesn't require a lab to test.

Don't misunderstand me, I appreciate your attempts to quantify all this, and I actually think you guys have roughly the correct result (given the limited nature of fizzbuzz-style coding), but when you step into the experimental psych arena, you need to learn how to properly analyze your data. Given that your business is predicated on analyzing the results of how your hires do in the real world, you need to really up your analytical game.

milesvp · on June 23, 2015

I have to agree with kingmob. It very much sounds like survivor bias. My first reaction is that anyone who drops out due to a test has a high likelihood of dropping out because they can't do the test, which would leave you with a low correlation test when compared with the survivors, but a very high anti correlation with the total population.

I read a blog post a couple years ago by a game programmer/designer who outsources a lot of work through places like odesk/elance. Basically his thing was to weed out the fakers, he'd offer anyone ~5hrs at their bidding rate to finish a predefined programming task expected to take ~5hrs. He says this will usually drop his pool to less than 10 out of the hundreds who may apply, and he can usually use at least one of the people who complete the task. It's hard to say how many of these people go away because the task looks too big, and there's risk of not getting paid, but it's clearly a good filter for him.

As far as measuring this survivor bias, you might gain some insight by randomly altering the order of the testing. You could measure when people tended to drop off. You might even find that people all tend to drop off around the same amount of time, or maybe after some certain amount of effort. It might even be worth paying people to see if that would improve completion rates (while introducing it's own biases).

el_fuser · on June 23, 2015

>> I appreciate that you guys might not be statisticians

and apparently very few on HN in general, seeing how this piece was upvoted.

Google has been studying their process in the last few years and are reaching different conclusions than these guys.

Leave it to the reader to ascertain which group has more validity.

dang · on June 24, 2015

This is not the kind of comment HN needs more of. A better version would (a) drop the snarky putdown, and (b) actually say what Google's conclusions are. Then readers could decide for themselves to what degree those findings contradict these, instead of being told what to think.

jasode · on June 23, 2015

>Doing well on the fizzbuzz problems was not very correlated.

If you mean "correlation" to only refer to the population that passed fizzbuzz, then it is to be expected that the final positive/accepted interview evaluations don't correlate. Fizzbuzz was never statistically designed for that. It was designed for early rejection and not for predicting ultimate success at the end of a multi-step interview cycle.

>By dropoff I mean people who left during a step, and never logged back into our site.

And the population mentioned in this sentence is what I first interpreted to be included in your "non-correlation". It looks like you don't include this population. The quitters that never logged back in were not further tested by you for later stage evaluations. That's where my confusion was and now it's resolved.

contravariant · on June 23, 2015

Agreed, it would be more useful to know the false negative rate (i.e. those incorrectly 'rejected' by fizzbuzz). The small / negative correlation could just be caused by a high number of false positives.

curun1r · on June 23, 2015

Asking the author because I'm curious...

Are you tracking longer-term hiring outcomes too? They'll probably take some time to become meaningful, but they're far more important. The data you've compiled is useful since it helps to filter earlier in the process, but it still presumes that your in-person interviewing process makes the correct decision. If the final filter is letting bad candidates through or screening out good candidates, all the correlations you've found could be reflecting only the ability to pass the interview, not the ability to do the job successfully.

Hopefully you're continuing to follow hires 1, 2, 5 years after being hired to tie it back to the data you collect about the interview process. It would be awesome if you could find predictors of candidates that are likely to quit less than a year after being hired or candidates that will receive less-than-stellar ratings from their managers. By doing this, you'd help hiring managers deal with the blindspots in their hiring, not just streamline the existing process.

dagw · on June 23, 2015

Doing well on the fizzbuzz problems was not very correlated.

I think you're mis-using fizbuzz. You cannot really 'do well' on it. You can basically pass it or fail it. Failing means you're probably no good, but passing it doesn't prove anything.

ammon · on June 23, 2015

Yeah, I just mean "short programming problems of a difficultly similar to fizzbuzz"

mwfunk · on June 24, 2015

That's the same thing though. If it's the same level of difficulty as fizzbuzz, it serves the same purpose: you can fail at it, but you can't really do well at it. All you can do is not fail.

protonfish · on June 23, 2015

I assumed it was just a quick summary, but yeah, I'd love to see more data too.