I’m not sure if I should be horrified or not. Both by the fact this happens, and...

scottlocklin · on Feb 15, 2020

Feel free to be horrified: a data scientist who doesn't understand where and why to use unix command line tools for data preparation and ETL is about as useful to me as one who doesn't understand the conditions where a t-test breaks down or what a ROC curve is.

Generally speaking, people like this have never actually dealt with large data sets, never dealt with issues involved with installing "unapproved software" on a machine (ridiculously common in The Real World), has probably never cleaned a dirty data set (what do you do when your giant csv is formatted in a way that Wes McKinney didn't think of?), and will in a senior role be a long term liability for a data science team that works on serious problems. Sure at one point I didn't know about them either: I wasn't a senior data scientist then. I submit that if you don't know about them and haven't actually used them, you aren't either.

blub · on Feb 15, 2020

I think that the people not being impressed by cut and sort are approaching this from the Linux end of things, where those tools are nothing special at all. I guess we kind of expected that the data science wizards would be using fancier tools.

scottlocklin · on Feb 16, 2020

Yeah, well, people who have enough self regard to think of themselves as "wizards" are super unlikely to be able to actually do the day to day grind of getting, cleaning and preparing data for feature generation, which is about 95% of the job.

Another good weeder for a person claiming to be senior: discuss how you would fix the performance of the default R naive Bayes implementation in e1071. It's numerically more or less correct, but written by deranged ape-men who don't understand how computers work (a problem in a lot of the R ecosystem; in the Python ecosystem, the problem is nobody has yet written algorithms for X, which ends up being a very similar problem: aka it's your job to code up sane algorithms).

whalabi · on Feb 15, 2020

So I think this is a perfect example of what happens in tech interviews.

OP is using knowledge of a specific technology as a heuristic for "has experience in role x"

But this always makes me wonder, couldn't you see that experience from a resume? If the candidate filled a data science role at somewhere reputable for 3 years, and you verify that they successfully filled that role, why rely on that heuristic?

As you say testing for the specific technology, when it can be learnt in 10 minutes, does not seem logical.

VRay · on Feb 17, 2020

Don't worry, he responded with this very data-driven explanation:

> Generally speaking, people like this have never actually dealt with large data sets