Given the COVID-19 pandemic, I have more time to work on my website these days. So I made a call to my LinkedIn community (post) to find out what they will like me to write on. One interesting topic was, "Launching machine learning models in difficult times."
In many companies that are matured in data science, they generally make a very huge assumption and that is, "The future will look VERY similar to the past." That is why we used past labeled data to build our supervised machine learning model. And to gain confidence that the models trained can be used in the future, we do cross-validation (a.k.a out-of-sample validation) and out-of-time validation.
Unfortunately, training and testing of machine learning models take time and if your chosen machine learning model needs to undergo an external audit, that will take even longer. Chances are you may be faced with a situation where the world has taken a huge downturn rapidly for instance the current pandemic or the Financial Crisis of 2008. This means the machine learning model that you have built, which is trained in understanding the behavior during normal circumstances, needs to now be used in circumstances that are completely different!
So much effort is spent to train and finalize the model! (Noooo!) Secondly, next to zero amount of data is collected on consumer behavior in a crisis and training a new model will take time, time that is a scarce resource in a crisis.
Consider this, does the consumer behavior change very drastically in the time of crisis? For instance, if you are a supermarket owner with only brick-and-mortar as a distribution channel. In a pandemic, consumer behavior may not change drastically since grocery shopping is still needed. If you expect that consumer behavior gets moderated and not change drastically, in my opinion, the model can still be launched.
Manage the Cut-Offs
For models to help make better decisions, you might now look at the cut-off points. For instance, banks have credit scorecards, machine learning models that determine credit risk. The higher the score, the lower the credit risk.
Let us say the model was built during an economic boom, thus the model learned about the credit card consumer's behavior in a boom time. In normal circumstances, the bank will take action on the consumer when it is below a cut-off of 78 (out of the highest attainable score of 156). The cut-off is a strategy parameter that it is taking to manage default risk in the boom time.
When the model is launched in an economic bust, consumers have a higher chance of default across the entire group, the bank may now consider increasing the cut-off from 78 to, say 120 for example. This means, the bank will take action to mitigate risk for any consumer scoring less than 120 now.
What I am proposing is, go back to your model and think about the decision cut-offs. If you are using a classification model, perhaps manage the threshold to mitigate the situation. If previously you found out that the threshold 54% is ideal i.e. getting the best precision or recall, it might be time to relook again and adjust the threshold.
When implementing the models during a crisis, what is very important is to learn how your selected model performs in a crisis. Constantly evaluate your model, see if it is working as per normal or worse, and also how much worse. If your organization is matured, you can always go back to your documentation to extract the baseline results. Constantly evaluate and compare the actual results with the baseline results. The difference will help you to better manage the cut-offs (mentioned above).
Collecting More Data
Such "opportunities" are rare so another suggestion is to plan your data collection strategy. See if more data can be collected during such times to better understand consumer behavior, which can reap benefits later on when refreshing (re-train or re-calibrate) the model.
It is quite unfortunate that, after all the hard work put into training up a model and getting ready for implementation, the tides have changed. But we can also see it as an opportunity to learn more about our business, to learn more about our markets, business model, consumer behavior and so on. So besides mitigating the risk that the model might be wrong (cut-offs and constant evaluation), we should also strategize to take advantage of the situation as well (collecting more variables and data points)
If you find this post to be useful, do consider subscribing to my newsletter (see below). If you have any feedback, feel free to link up with me on LinkedIn or my Twitter (@PSkoo). Do consider signing up for my newsletter too. I wish you all the best in your data science career! :)