Happy 2021 everyone!

I figure most of you are preparing your New Year aspirations and if it is to have a successful career change to Data Science or Artificial Intelligence, I hope this post helps!

Here is a rough plan if you want to be a data scientist, from scratch.

Level 0 - Calculus, Linear Algebra, Statistics

Level 1 - Coding (SQL at least), Data Analysis, Data Visualization

Level 2 - Coding (include R & Python), Data Munging

Level 3 - Machine Learning (Train, Validate, Selection)

Level 4 - Lots of Hands-On

Level 5 - Business Implications of Machine Learning

Level 0 - Calculus, Linear Algebra, Statistics

If you have read my previous posts, you will understand that I believe strongly that a data scientist should have a strong mathematics foundation. But how to start? What are the essential branches of mathematics? Having self-studied for the past decade and a half, one cannot miss out on Calculus, Linear Algebra, and Statistics.

For Calculus, a good understanding of first- and second-order differentiation, what it represents on a plane, and chain rule is a good start. For Linear Algebra, having a good understanding of matrix functions like addition, subtraction, inverse, dot products, and eigenvalue/vector helps build a strong foundation. As for Statistics, different probability distribution, summary statistics, hypothesis testing will help.

"How much is enough?" you may ask? Remember you will have to come back to these topics, regardless, so pick up what you can in one or two run-through first. :)

Level 1 - Coding (SQL at least), Data Analysis, Data Visualization

Yes, you cannot run away from coding if you want to be a data scientist that adds value to your organization. The first language that you should go for is SQL (short for Structured Query Language).

Companies that have a serious attitude towards data are likely to have a database built. The database is likely to be using the SQL query language to interact with it. To extract the relevant data for your analysis, SQL language is a must learn. It is not like what is purported outside, that the first language a data scientist should learn is Python. Having a good grasp of SQL will allow you to extract the relevant data from different tables in the database.

In most cases, to immediately extract information/insights from data, data analysis, and data visualization help a lot!

If you look at any data science project, a data scientist spends a lot of time analyzing the data. Why? To extract raw insights, in preparation for machine learning model training, look out for possible data quality issues, any breach of model assumptions, any possible challenges with the "target" chosen. There are many, many, many analyses to be done. Thus having a keen eye for details and doing good data analysis helps!

To perform well in data analysis, a popular tool that we use is data visualization. Designing good visualization is never easy! If it is, bad presentations will be a thing of the past for a long while now but that is not the case, at least for me. With good data visualization skills, you can discover quickly the insights from your data and also ascertain any possible data quality issues.

I will continue describing the other levels when I have more time but for now, the above should be sufficient for you to work on for the next 1-2 months. :)

I wish you all the best and do lookout for my next post, on the next few levels.

Do check out my other blog posts. Keep in touch on LinkedIn or Twitter, else subscribe to my newsletter to find out what I am thinking, doing or learning. :)