There is one aspect of machine learning (i.e. supervised learning) that is commonly taken for granted, but now it is gaining traction thanks for Andrew Ng.
What is that? The target!
In a bootcamp project,the target has been set for you but in an industry project, chances are you have to define your own target/labels.
And there are many considerations in setting the target/labels!
1) Does the business stakeholders find the target “definition” useful?
Is that something they will like to know in advance, for instance, is it useful to know that a customer is going to churn in a week’s time VS knowing that a customer is going to churn in another 2 months time?
2) How much possible features can you generate to help with the model prediction? Can you capture enough data for the model training for the model to be “good”?
3) Do you have a balanced dataset (same amount for each class) based on the definition of target?
Besides data quality that all data scientist should focus on, having a well-defined and suitable target is also important as well as it will impact how good the final model can be. :)
TLDR: Besides data quality, also pay attention to the target definition. :)
What are your thoughts?
Please feel free to link up on LinkedIn or Twitter (@PSkoo). Do consider signing up for my newsletter too. I just started my YouTube channel, do consider subscribing to it to give support! :)
Consider supporting my work by buying me a "coffee" here. :)