Often I am asked by my clients and training participants, what are the common challenges in scoping data projects so I thought of putting some of my thoughts on it.

Business Challenge

I am a believer of value, that the project needs to be able to provide value greater than the cost of doing it in order to sustain an organization's momentum in using data to improve.

Thus here lies first challenge and also one of the biggest challenge in scoping projects. The business question we are trying to tackle. The business question will firstly need to be able to get buy-in from stakeholders, i.e. the stakeholders "badly" want this project to be done. So the project definitely need to be tackling the biggest pain that is happening right now in the business organization.

After qualifying the buy-in from stakeholders, the other thing to look at is the potential value that can be gained out of it, i.e. is the project providing something that stakeholders truly want. You might think that it is the same as the above, getting buy-in but it may not be the case. For instance, getting customers from a certain market maybe the biggest challenge, but the additional revenue may not be high.

However, value is just one side of the coin with the cost of the project to be the other side. Here lies another challenge, unfortunately the value that comes from a project can vary widely, whereas cost can be estimated easily with higher precision compared to value, thus when scoping the project, we also need to ensure that the value, in the possible worst case scenario, has a good chance of being above the cost of the project.

So the business challenge of scoping projects it to tackle, buy-in, estimated value and estimated cost of the project.

Data Availability & Quality

After the business question is scoped, the next challenge is converting the question into a mathematical modeling question, so that we can use data to answer it.

After the conversion, the next challenge will be data availability and the quality of data.

Most companies collect data haphazardly but who can blame them when survival is on the top of their mind, before thinking about the possibility of using data to put them on a better footing. Thus for most companies, relevant data that may answer the business question may not be collected or the data collected does not may not be of the highest quality. This is also true for more mature companies but of course the occurrence is lesser.

Thus after fixing the business question, chances are data might not be available or the quality of data might not be suitable for the project, which can mean either to modify the business question further or scrap the project for now, till data collection and quality improve.

Concluding, project may need to be modified further depending on data availability and quality of data.

Regulations and Compliance

With the increase consumer knowledge on how personal data is being used and also the need to understand how machine learning models come to certain decisions, there will be increasing regulation on data projects. And they do have an impact on data projects for instance, the documentation required, the need for explainability of models, etc. All these does contribute to the challenge of scoping worthy projects in organizations. For instance, on the part of explainability, having the need to explain how models come to certain decision/predictions, its impact can be on the final model being used, resulting in the need to accept a drop in final model performance and thus impact the value that can be derived from the project. For instance, decision trees with a lower recall may need to be chosen and implemented as compared to a support vector machine with higher recall, given that decision trees is much easier in the explainability department.

Conclusion

Scoping projects is not easy as there are many factors to consider. The above are just the major factors that may impact the projects suitability and value. Given the myriad of factors that may impact the project, it is a task that might not be easily undertaken by 'green' data scientist actually. A pair of experienced eyes might actually be better rather. So if your organization actually find scoping projects overwhelming, my recommendation is to find experienced personnel who can help with the navigation to avoid pitfalls, and move up the learning curve faster.

What are your thoughts on this? You can share them with me on my LinkedIn. Please feel free to link up on LinkedIn or Twitter (@PSkoo). Do consider signing up for my newsletter too. Have a great week ahead!