Here are some bullet points (Source: Glassdoor) to sum up the graph above:
Master’s or Ph.D. in statistics, mathematics, or computer science
Experience using statistical computer languages] such as R, Python, SQL, etc.
Experience in statistical and data mining techniques, including generalized linear model/regression, random forest, boosting, trees, text mining , social network analysis
Experience working with and creating data architectures
Knowledge of machine learning techniques such as clustering, decision tree learning, and artificial neural networks
Knowledge of advanced statistical techniques and concepts, including regression , properties of distributions, and statistical tests
5-7 years of experience manipulating data sets and building statistical models
Experience using web services: Redshift, S3, Spark, DigitalOcean, etc.
Experience analyzing data from third-party providers, including Google Analytics, Site Catalyst, Coremetrics, AdWords, Crimson Hexagon, Facebook Insights, etc.
Experience with distributed data/computing tools: Map/Reduce, Hadoop, Hive, Spark, Gurobi, MySQL, etc.
Experience visualizing/presenting data for stakeholders using: Periscope, Business Objects, D3, ggplot, etc.
Take notice at the number of times “experience” is mentioned. Now let’s take a look at a reasonable Data Analyst (Springboard Blog) description:
Degree in mathematics, statistics, or business, with an analytics focus
Experience working with languages such as SQL/CQL, R, Python
A strong combination of analytical skills, intellectual curiosity, and reporting acumen
A solid understanding of data mining techniques, emerging technologies (MapReduce, Spark, large-scale data frameworks, machine learning, neural networks and a proactive approach, with an ability to manage multiple priorities simultaneously
Familiarity with agile development methodology
Exceptional facility with Excel and Office
Strong written and verbal communication skills
In these job descriptions, the word experience is mentioned eight times more for a data scientist than for a data analyst. So how is it then possible that these individuals who have recently started in the path of data science becoming data scientist within five years? Or even better, they become a data scientist right after graduation (undergrads or grads). I do see the Ph.D. level as a stepping stone toward becoming a data scientist due to the amount of time, research, and application that is put into earning a Ph.D. — however I cannot say the same for those with an undergraduate degree in CS or a one-year Masters degree in a similar topic (unless previous work experience says otherwise).
Ryan Thorpe @ Towards Data Science wrote a great post on the topic and here is a snippet:
Data Scientist, the true unicorn.
For those that haven’t heard this, it’s a common description for the role used in the field. A true data scientist possess these skills:
STRONG — Business acumen
STRONG — Math/Statistics
STRONG — Computer Science/Ability to sling code
The unicorn is someone who’s perfect at all three. This is seldom the case. The most likely scenario is someone who lacks, or is weaker in one of the three.
Data Analyst, show me my business.
Data Analytics is very similar, they possess these skills:
STRONG — Business acumen
MODERATE — Math/Statistics
MODERATE — Computer Science/Ability to sling code
As you can see, a Data Science requires a strong skill-set in all three categories. However, both roles require the same skills. The biggest difference is how they apply these skills. Let's clear up the misconception.
What exactly does “acumen” mean when mentioned in business acumen? The definition according to English Oxford Living Dictionaries: The ability to make good judgements and take quick decisions. Now business experience is not required to have this ability, but it sure is a strong correlation among those who do have a strong business acumen. It almost seems to me that there is even an additional stepping stone toward prior to even becoming a Data Analyst—a Business Analyst? This is me thinking on the fly, but I think is worth pondering.
This is purely my opinion, but when I come across a “Data Scientist” on LinkedIn, I am more than likely looking at a Data Analyst (maybe even Business Analyst) who has the aspirations of someday becoming a Data Scientist. Nothing wrong with this—as a matter of fact, these are my current aspirations. But businesses should become increasingly aware of the differences between the titles and match their expectations accordingly. After all, you do not want to hire several Data Scientist at six figures each and come to find out that their value truly amounts to what are known as Data Analyst (or Business Intelligence Analyst). This would set-up failure for all those involved within a long-term value.
My Simple Data Scientist Path
Details are always changing, but here is a high level idea. While using the descriptions above, here is my current plan (living document of course!) towards becoming Data Scientist:
Business Analyst (1+ years)
Data Analyst (5-10 years + candidate in either MS Data Science or PhD Data Science)
Data Scientist (MS Data Science or PhD Data Science)
Of course these high-level variables depend on specific tasks given within each role, but it does reflect a reasonable ballpark range within the development of a UNICON (Data Scientist).
Additional Reads on the topic:
Data Analyst vs Data Scientist — What’s the difference?
Career Comparison: Data Analyst vs Data Scientist—who does what?
Difference Between Data Scientist and Data Analyst
Blurred Lines: Data Analyst vs Data Science