Imo, this post did not organize its data and findings into a coherent presentation.
For example...
>The fizzbuzz-style coding problems, however, did not perform as well. While the confidence intervals are large, the current data shows less correlation with interview results. [...] The coding problems were also harder for people to finish. We saw twice the drop off rate on the coding problems as we saw on the quiz.
I read that paragraph several times and I don't understand what he's actually saying. If those candidates "dropped off" on the fizzbuzz, were they also still kept for further evaluation in the following extended coding session? A later paragraph says...
>So we started following up with interviews where we asked people to write code. Suddenly, a significant percentage of the people who had spoken well about impressive-sounding projects failed, in some cases spectacularly, when given relatively simple programming tasks. Conversely, people who spoke about very trivial sounding projects (or communicated so poorly we had little idea what they had worked on) were among the best at actual programming.
For the fizzbuzz failures to be non-correlative and counterintuitive, it means he did not reject them for failing fizzbuzz and they later ended up doing spectacularly well in the larger coding sessions. If that's what happened, then yes, that is a very counterintuitive result. What were the topics of the larger coding sessions?
Hey! Author here. Yes, we did not know that the screening steps were meaningful, so for the first 300 applicants, we interviewed everyone, even people who performed badly. We then looked for correlation between screening step scores, and programming interview results. Doing well on the fizzbuzz problems was not very correlated.
By dropoff I mean people who left during a step, and never logged back into our site.
I appreciate that you guys might not be statisticians, but if you're going to try and analyze data like this, you simply must address survival bias. As it stands, these data are meaningless unless you assume dropouts are completely unrelated to your screening.
You claim doing well on the Fizzbuzz wasn't correlated with interview performance, but you also said "We saw twice the drop off rate on the coding problems as we saw on the quiz."
An alternate explanation for your finding then is, more low-quality candidates drop out of the process when given FizzBuzz, leaving a relatively homogeneous pool of higher-quality candidates for the later interview. This effectively reduces the ratio of meaningful interindividual differences relative to noise, which will reduce the correlations.
In all likelihood, both of your correlations are low, but the idea that coding is less predictive than a quiz could be purely a statistical fluke due to survivor bias.
All candidates did both screens (quiz and fizzbuzz). The correlations were calculated against the same population. Now, I agree that survivor bias could affect the quality of these results (we know nothing about the significant % of people who dropped out). But it's not really possible to solve that problem outside of a lab. I don't think it's an argument to not do analysis. For now we're simply trying to minimize the dropoff rate, and maximize correlation. The quiz was better at both.
Well, the candidates doing both screens is better, but it doesn't totally solve your problems.
It doesn't address the survival bias issue, and when you say a significant percentage dropped out, that's not reassuring. But it's not the case that you need a lab to solve the problem. Even a basic questionnaire of programming ability self-assessment might tell you if there are meaningful differences in the population that quits your process. At the very least, you should understand and talk about survival bias in your article to indicate you're aware of the issue.
Even if you still want to claim a difference between the quiz and coding exercise, you're not yet in the clear. For example, did you counterbalance the order you gave them to people? E.g., if everybody did the quiz first and the fizzbuzz second, that meant they were mentally fresher for the quiz and slightly more tired for the fizzbuzz, which could again create a spurious result. And this definitely doesn't require a lab to test.
Don't misunderstand me, I appreciate your attempts to quantify all this, and I actually think you guys have roughly the correct result (given the limited nature of fizzbuzz-style coding), but when you step into the experimental psych arena, you need to learn how to properly analyze your data. Given that your business is predicated on analyzing the results of how your hires do in the real world, you need to really up your analytical game.
I have to agree with kingmob. It very much sounds like survivor bias. My first reaction is that anyone who drops out due to a test has a high likelihood of dropping out because they can't do the test, which would leave you with a low correlation test when compared with the survivors, but a very high anti correlation with the total population.
I read a blog post a couple years ago by a game programmer/designer who outsources a lot of work through places like odesk/elance. Basically his thing was to weed out the fakers, he'd offer anyone ~5hrs at their bidding rate to finish a predefined programming task expected to take ~5hrs. He says this will usually drop his pool to less than 10 out of the hundreds who may apply, and he can usually use at least one of the people who complete the task. It's hard to say how many of these people go away because the task looks too big, and there's risk of not getting paid, but it's clearly a good filter for him.
As far as measuring this survivor bias, you might gain some insight by randomly altering the order of the testing. You could measure when people tended to drop off. You might even find that people all tend to drop off around the same amount of time, or maybe after some certain amount of effort. It might even be worth paying people to see if that would improve completion rates (while introducing it's own biases).
This is not the kind of comment HN needs more of. A better version would (a) drop the snarky putdown, and (b) actually say what Google's conclusions are. Then readers could decide for themselves to what degree those findings contradict these, instead of being told what to think.
>Doing well on the fizzbuzz problems was not very correlated.
If you mean "correlation" to only refer to the population that passed fizzbuzz, then it is to be expected that the final positive/accepted interview evaluations don't correlate. Fizzbuzz was never statistically designed for that. It was designed for early rejection and not for predicting ultimate success at the end of a multi-step interview cycle.
>By dropoff I mean people who left during a step, and never logged back into our site.
And the population mentioned in this sentence is what I first interpreted to be included in your "non-correlation". It looks like you don't include this population. The quitters that never logged back in were not further tested by you for later stage evaluations. That's where my confusion was and now it's resolved.
Agreed, it would be more useful to know the false negative rate (i.e. those incorrectly 'rejected' by fizzbuzz). The small / negative correlation could just be caused by a high number of false positives.
Are you tracking longer-term hiring outcomes too? They'll probably take some time to become meaningful, but they're far more important. The data you've compiled is useful since it helps to filter earlier in the process, but it still presumes that your in-person interviewing process makes the correct decision. If the final filter is letting bad candidates through or screening out good candidates, all the correlations you've found could be reflecting only the ability to pass the interview, not the ability to do the job successfully.
Hopefully you're continuing to follow hires 1, 2, 5 years after being hired to tie it back to the data you collect about the interview process. It would be awesome if you could find predictors of candidates that are likely to quit less than a year after being hired or candidates that will receive less-than-stellar ratings from their managers. By doing this, you'd help hiring managers deal with the blindspots in their hiring, not just streamline the existing process.
Doing well on the fizzbuzz problems was not very correlated.
I think you're mis-using fizbuzz. You cannot really 'do well' on it. You can basically pass it or fail it. Failing means you're probably no good, but passing it doesn't prove anything.
That's the same thing though. If it's the same level of difficulty as fizzbuzz, it serves the same purpose: you can fail at it, but you can't really do well at it. All you can do is not fail.
For example...
>The fizzbuzz-style coding problems, however, did not perform as well. While the confidence intervals are large, the current data shows less correlation with interview results. [...] The coding problems were also harder for people to finish. We saw twice the drop off rate on the coding problems as we saw on the quiz.
I read that paragraph several times and I don't understand what he's actually saying. If those candidates "dropped off" on the fizzbuzz, were they also still kept for further evaluation in the following extended coding session? A later paragraph says...
>So we started following up with interviews where we asked people to write code. Suddenly, a significant percentage of the people who had spoken well about impressive-sounding projects failed, in some cases spectacularly, when given relatively simple programming tasks. Conversely, people who spoke about very trivial sounding projects (or communicated so poorly we had little idea what they had worked on) were among the best at actual programming.
For the fizzbuzz failures to be non-correlative and counterintuitive, it means he did not reject them for failing fizzbuzz and they later ended up doing spectacularly well in the larger coding sessions. If that's what happened, then yes, that is a very counterintuitive result. What were the topics of the larger coding sessions?