Recent years (or shall I say in the past decade), data has been identified as the new oil. The world is collecting tons of it, no thanks to the advancement of technology. The term "Big Data" was coined and marketed to tell businesses the importance and urgency to start making sense of data. The current mantra we have is "The World is Data Rich but Insights Poor". Continuing with "Data is the New Oil" theme, we now need to refine and process our data, but strangely not paying too much attention to the storage and extraction of the "refined oil", the insights (i.e. less attention to Data Management & Data Governance, which is a topic for another blog post).
During my meetups when I catch up with a few of my friends who just started in the field, one of the "loudest" laments I heard is the misalignment between three things, job title, job scope and job expectations.
Job Title: Everyone is hiring a "Data Scientist".
Job Scope: Under the title of "Data Scientist" one has to do only analysis reports or build data pipelines, and mentions of Machine Learning are far and few. Moreover, it seems like some of the defined scope are all the same...especially the tools required.
Job Expectations: There are tons of articles telling me that "Data Scientist is the sexiest job in the 21st Century" and "Data Scientist is the profession with the highest demand". And the hype is pushed up further because Industry 4.0 is coming, and with that comes digitization. More companies are getting onto the data "gravy train" because of digitization. I just graduated from bootcamps, paid a lot of money for it, learnt Machine Learning only (sexy part). I expect to do Machine Learning and nothing else!
Look! Data Science or Artificial Intelligence is a team sport! Team members play different roles and work on different scope. The team has to work together for the business to win! Ever seen a soccer team that is ALL strikers or ALL defenders and manage to win tournaments consistently?
So if you want to build up capabilities, you need to hire the right people. If you are hiring, ask yourselves a few questions:
1) Do you need someone to design the dashboard and performance metrics that go on to it? Do you need someone to analyze for you what is going on in your business, to give you insights on your past business performance?
2) Do you need someone to build IT infrastructure for you? Do you need someone who can set up and manage the storage of data? Do you need someone to build a data pipeline for you? Do you need someone to implement machine learning models into your decision systems?
3) Do you need someone to make sense of the data using complex mathematical models (beyond summary statistics)? Do you need someone to apply machine learning algorithms in your process so that you can make better decisions? Are you interested in predictive analytics or prescriptive analytics? Do you need smarter automation?
If you answer "yes" to the most of the questions in the first set, you do not need to hire a Data Scientist. You need a "Data Analyst". In the job scope of a data analyst, it consist mostly of Descriptive and Diagnostics Analytics. An analyst, a traditional role in my opinion, focuses on looking at past data to discover trends, its a job scope that taps onto their analysis skills tremendously.
By the way, most boot-camps do not teach their trainees how to analyze data. Like what are peculiarity in trends, and how to find out the reasons for peculiarity? Why is that so? Firstly, analysis is not too complex and thus NOT sexy! What is so sexy about knowing the average sales for the past 6 months? Its not "complex" enough.
In my opinion, analyst role is essential. They are the first round of defense in detecting changes, both internal and external. To be a good analyst, he/she must have good domain knowledge, so that he/she can come up with relevant hypothesis for the trends and dig further to establish the actual reasons for the change/trend.
Data analyst is essential in an organization because the past (internal and external environment) needs to be figured out before we can talk about how to move forward in business. Hire a data analyst if you are looking to understand your businesses performance.
If you answer "yes" to most of the questions in the second set...congratulations! You actually want to hire a Data Engineer! Data Engineers generally focus on the data IT stack or to put it more layman is they are focused on the data IT infrastructure.
For instance, building the data pipelines from collection/storage to implementing models into the decision systems. Anything that is related to IT infrastructure, they will usually be the first person to turn to. Data scientist/data analyst often work together with the Data Engineer, to ensure data collected are of the highest quality and that machine learning models are implemented correct and in working conditions (i.e. continuously calculating the score correctly for decision making).
So if my organization is very new to this?
Organizations that just started do not have to hire a very experienced Data Engineer because the organization do not need sophisticated data tools. To manage the cost, either go onto cloud computing (Google Cloud Platform, Azure Cloud Services or Amazon Web Services) and hire someone who can manage cloud computing platforms or to keep it in-house, hire someone with these skills: familiarity with a scripting language, Relational Database Management Systems and IT server management. And I strongly advocate that the person be groomed into a Data Engineer as the organization matures.
Data Scientist work is mostly in Predictive & Prescriptive Analytics. They extract insights using more sophisticated mathematical models like Machine Learning, Linear Programming etc.
For more information on the difference between Data Scientist and Data Analyst, have a read here.
In general, Data Scientist takes a more mathematically sophisticated approach to solving business challenges with data. There is a career pathway from Data Analyst to Data Scientist. Which means if your organization just started out, you can hire a team of data analysts first and good talents can be further groomed into Data Scientists.
What about Data Cleaning?
I have heard of "Data Scientist" who do not know how to clean data or another misconception is "Data Analyst cleans data for Data Scientist."
Let me set this straight, both job roles HAVE TO clean data. Cleaning data brings benefits to analysis and also in training Machine Learning model. It helps the Data Scientist and Data Analyst to understand the nuances in the business process and also consumer markets. Cleaning data is a way to learn more about the domain. If you ever encounter a Data Analyst and Data Scientist that does not know how to clean data, STAY FAR AWAY from them!
To kick start the re-alignment, so that companies hire the right talent and be able to start their Data Science journey with confidence, plan the possible projects first. You can either hire an experienced data scientist or an experienced consultant to scope potential projects (for more details please refer to my other blog post here). Use these projects to convince and onboard more stakeholders. Once the level of buy-in inside the company has increased to a certain level and wants to build data science capabilities, then it is time to plan out the manpower requirements for instance how many experienced data engineers, data analyst etc. At the end of the day, do not execute your hiring process without a plan or roadmap. It will be a waste of time and resources for both the company and the talents hired.
I sincerely hope that all companies embarking on the Data Science and Artificial Intelligence journey will always have a fruitful one so that we as a global population can build better and meaningful jobs in the Data ecosystem. :)
For more of my other blog post, please click here. :)