Often, businesses have this view that "Data science is good for my organization. My organization has collected data! We CAN do data science! Let's do it!" In my opinion that is a very simplistic view.

To put it in a better perspective, management needs to see Data Science as a manufacturing process where raw data is the raw materials, and insights are the output. The insights need to meet requirements, requirements that are defined by the business question/challenge that needs to be overcome or answered. So to start the Data Science journey for businesses, they have to go through the following stages:

1) Getting Data in Order

For this part, there are two tracks, data management, and data collection.

Data management involves getting the data to sufficient quantity and quality for data to be meaningfully analyzed.  Characteristics of data to pay attention are for instance, accuracy, timeliness, missing data handling, etc. Besides these, the other supporting processes are data validation, back up process, data update policies, assignment of roles & responsibilities, etc have to be worked on as well.

The second track is collecting the 'right' data. Start by identifying the 'low hanging fruits'. Finding out what matters to the business the most, what insights matters then plan out the data to collect to gain those insights. While planning out, always look internally, looking for existing data first before thinking about what other data to collect. After exploring existing data, we can move on to looking out for data that can be acquired easily. Throughout the data collection, always ask yourself how much the data to be collected (i.e. collect 1 or 2 years worth of data) and also the granularity of the data (i.e. should month  or weekly data be collected).

2) Getting the Reporting Process in Order

After figuring out the data, the next step is to plan out the reporting process. These reports should be essential reports that answer common day-to-day business operation questions, for instance, how much stock have I sold, how much inventory do I have left, how many new customers have I acquired. If a business is to start working on Data Science, it cannot run away from setting up a detailed and useful reporting process to support it.

By planning out the reporting process, the business also needs to plan out the ETL  (Extract Transform & Load) process. It need not be complicated since there are only a few reports to be generated at this stage but the whole ETL process can become a big ball of spaghetti in the future and prepare for that future, good documentation is needed.

The perception here might be the need to build a 'sophisticated' data warehouse but it need not be the case. It depends on the amount of data needed for the reporting process. It does not make sense to spend thousands of dollars on a data warehouse only to generate very few reports. But scalability is something the business would need to consider when planning out the ETL process together with the reporting process.

3) Let's do Data Science

After the first two stages, we can move to Data Science. Start exploring current processes that can benefit from Data Science. Processes that are capital and/or labor-intensive and collects a huge amount of data are prime candidates to see if Data Science, namely Machine Learning, can work. Processes that need to be  scaled up or have consistent results can also benefit.

Again go for the lowest hanging fruits to gain experience first before moving on to the more complicated ones. Go for quick wins so that there is momentum in the business to move forward with Analytics.

I would like to make a point here that many managers have the impression that moving from stage 1 to stage 3 needs huge investment. That need not be the case. What is more important is that investment should go hand-in-hand with the actual value gotten to ensure the momentum is preserved, that Data Science/Analytics can continue to benefit the business. Just bear in mind the need to scale up in the future, once business have learned more about the benefits and implementation of Data Science/Analytics.

For processes that can benefit from Data Science, the likely case is that the decision (statistical) models will be embedded into the IT systems. That will require proper planning.

Things do not stop here. With each model that is embedded into the business processes, businesses have to go back to their reporting process, to add in model validation reports. This is to ensure that embedded models are monitored on their 'predictive' power, ensuring they are at an acceptable level and if not what should be done. Proper policies should be drawn out to ensure the right steps are taken when models are not performing. Continuous monitoring and taking necessary actions allows the business to continue benefiting from Data Science/Analytics.

As the quote goes "Rome is not built in a day", businesses who want to continuously reap benefit from Analytics/Data Science need proper planning. This is to ensure that a strong foundation is created. Doing Analytics/Data Science in businesses is like growing a pyramid where the base (Data in Order) and the middle section (Reporting Process) need to be built up strongly before the pyramid can grow taller. The earlier you start climbing up the learning curve, the more likely your pyramid will get higher. So start building!

If you are interested to know how to start building Data Science capabilities, drop me a note on LinkedIn or Twitter! Else keep in touch on my newsletter! If the article is useful, consider sharing it to your network. :)