During my training classes, when the topic of common machine learning models was being discussed, I usually will also discuss this sub-topic and that is the usage of insights from these models or model implementation into the business process.
For instance, we can get the most accurate model where it is very good at ‘predicting’ in which customers will respond to a marketing campaign, but if the wrong marketing channel or marketing message was used, the model will still not be able to bring value to the business. That is why in my previous posts, I recommended data scientists to understand strategic thinking and develop business acumen.
I found myself repeating this example during my conversations so I'm putting it as a blog post, hoping more data scientists will be aware that thinking through how the model is going to be used and considering its impact on stakeholders is important.
Machine Learning in Education
In my previous role, I had come across numerous projects proposed by students of data science. One of the ‘hottest’ industry that they choose to do data science on is Education. Now, I have a very strong interest in education as it is a social mobility tool when it is done well and education can create a significant and long-lasting impact on people’s lives. Thus project presentations that use machine learning in education always pique my interest.
One of the projects presented was looking at using machine learning models to classify students that are “at risk of failure” vs “able to pass the module”. If you ask me, this is definitely a use case of machine learning in education. I was very interested in how the presenting team will be using the model though, so I asked the question, “So what are you going to use the model for once it is built?”
The answer that came back was, “We will use the model to determine who is at risk and who is not. Once we have done that, we will focus our resources ONLY on those who are not at risk of failure, ensure that they passed with significantly good results.”
*Jaw drop*
Since the team is made up of students who are new to data science, I decided it was a good chance to give them a lesson on the usage of models. So I continued, “So if the student is classified by the model as being at risk, it means he/she will be deprived of any resource to succeed and get good grades?”
“Yes!” came back the answer. So I continued, “What happens if I am using the model and YOU are classified as at risk of failing? Do you think it is fair for me to deprive any teaching/learning resource of you?”
Silence…
“You have to know that the machine learning model only serves us a probability of an outcome, not a 100% guarantee. If someone is ‘predicted’ to be at risk, it is only because the current features inside the model tells us that. Life is more complicated since there are other factors that can affect an outcome, in this case failure in exams, besides those captured in the data.”
Continuing, “What you could have proposed was with the same model built, we can firstly, investigate what are the significant factors that indicate a student is at risk and why is that the case? Secondly, we should aim to devise a good plan to help the students that are at risk of failure, since we can better identify them. This will improve the overall society, producing a more productive members of society and helping more students get out of the poverty trap, perhaps.”
Usage of Machine Learning Model
What you might notice is that with the same model, depending on how it was used, we can either help more people with it or use it to discriminate. I am a very strong believer that building the model is just a small part of the data science project, knowing how to use the model, especially ethically, is important as well. This is the reason for projects where I have control on the grading criteria, I will place a significant portion of the grade on strategy and implementation, not into the IT infrastructure but more of into business processes.
I hope readers after reading this will really put in more thoughts on how to use your models and empathize with its impact on people or customers. Building the ‘best’ machine learning model is only a small part of the bigger picture in deriving value from our data.
I hope the blog has been useful to you. I wish all readers a FUN Data Science learning journey and do visit my other blog posts and LinkedIn profile. Do consider signing up for my newsletter too. :)