What? A lot of the hard part isn't the model, and especially in a world where be...

manca · on Aug 17, 2022

I agree with this. From my experience most of the data scientists I have worked with didn't exit the world of Jupyter notebooks. For them, code management, CI/CD, dev/stage/prod separation, etc. is a world of its own that they are not very comfortable with. Heck, they even used Sagemaker to create git repo for their Jupyter notebooks.

It doesn't mean that there aren't data scientists who have some engineering experience as well, but this seems to be rare. For that reason, getting those ML models that they painstakingly build to where they'll generate some real value is super hard. They just don't know where to start. Working across multiple teams and multiple functions is very challenging and it often creates friction. Therefore, creating tools and systems that will enable those data scientists to see the actual value of their labor is paramount.

That's why we're seeing a huge resurgence of so called MLOps tools and platforms that aim to solve all or some of the problems of the entire stack. We are very very early in this journey, but I believe 2020's will be for ML and AI what 2010's were for the cloud and data, ie. new Snowflakes and Databricks but for the actual ML apps. It's exciting.

i_love_music · on Aug 16, 2022

Definitely agree with your first two paragraphs, but am confused by the pay paths. Can you expand on what the paths mean?

lmeyerov · on Aug 17, 2022

It's useful to work backwards from the knowledge a DS needs to be worth their weight. Imagine a small team of $400K/yr DS + $400K/yr DE + ... and whatever hw/sw . So say a $2-3M/yr project driving $3M+ of new growing revenue or $6-12M of annual savings. At bigger companies, even more magnitudes & pressure :)

The DS will likely:

- be close to the business case & business stakeholders to ask questions a normal lead can't

- know the relevant math + ML algorithms, and build up specializations pairing DS niches ("time series forecasting") with industry niches ("supply chains in manufacturing")

- enough engineering & performance understanding to work with a DE on going from small data sets to big ones

- have an intuitive feel for all of the above - how data/usecases/etc. go right/wrong

That's a lot!!

One path is jumping in as a low-paid intern or new grad and doing your time. But a pivot is different, esp. to get paid along the way. Most CS grads had little math ("intros to stats, combinatorics, & algs; dropped linear algebra"), weak ML ("did algs; intro to ML only covered kmeans & bayes; tried running a BERT model on some data"), and little intuition for how ML typically goes wrong ("what's class imbalance?"). So if they do get hired directly as a mid-level DS, it's probably on a team of the blind-leading-the-blind. Oops.

BUT SQL/Spark/K8S/pandas/regex are real skills. Doing the data engineering, ML operations, etc., around making an ML pipeline more than a fanciful notebook that wouldn't last a minute in production is real work. That stuff does pay well, and by working with the ML folks, you'd naturally get pulled into the ML tasks as well. DS write all sorts of bugs that surface as production evolves and the full team works together on, and new features that needs a team to make real. So taking a job that mixes engineering specialties with ML specialties is a smoother pivot path for the typical CS backgrounds I've seen. Over time, drift to more ML-y aspects of the projects happening until you can do the full hop. (Nit: That won't teach the math & deeper intuition, so I'd still do courses + projects on the side.)

vasili111 · on Aug 17, 2022

In general, does the DE have higher salary than DS?

Am I understood correctly that there is much more demand for DE than for DS?

lmeyerov · on Aug 18, 2022

I wish I had real numbers. So instinct from what I've seen:

- a data analyst role rebranded as a DS role will be lower paid than a DE role, maybe 50% diff

- an actual DS role is probably higher paid than a DE role, but really depends on the job+co

- a great DS role and a great DE role are both super well compensated. Though maybe again DS higher than DE in most just b/c ability to more directly drive $. Unless something like an infra company, the DS will be inherently closer to the business & outcomes. ("I did this clever thing that netted 2% revenue spike that adds up to $40M/yr in new revenue, what did you do?")

data_maan · on Aug 16, 2022

NeurIPS paper, not neuroips paper

lmeyerov · on Aug 17, 2022

still not used to the new name ;-)