In one section of my training, I need to cover what Big Data is. So most of you may already know there are 4 "V"s that describe what Big Data is and they are:
- Volume
- Velocity
- Variety
- Veracity
For more details can refer to the image below:
However, I have seen too many people caught the "Big Data" bug and has been quoting it to create urgency in their organization to undertake data initiatives. Nothing wrong with this but I just want to caution that we should not be taking Big Data at face-value.
At the end of the day, to take advantage of data, we should be looking at RELEVANT DATA, rather than Big Data. Big Data is just a description of the macro environment but I am a strong believer that if your organization want to start taking advantage of data, planning and collecting Relevant Data is key.
Share with you a story, to explain what I meant.
Imagine you are going to the doctor because you do not feel well. While sitting outside, waiting for your turn, you observed the different symptoms of other patients. When your turn comes, you went in and take a seat beside the doctor.
You:"Doctor, I am not feeling well. I saw Patient A has running nose and has been coughing non-stop. Patient B seems to be feverish and lethargic. Patient C seems to have a huge case of rash, and he is scratching himself non-stop.
So can tell me what is my diagnosis?"
After you said this, I am very sure the doctor is scratching his/her head and without the slightest clue on how to give you your diagnosis. What is happening here? Well, you collected a lot of data (Big Data) definitely but those data are NOT RELEVANT at all. The doctor cannot use those data to make a diagnosis on you.
This example demonstrated you can have Big Data (collected many patients data) but if the data collected does not have any information value to answer business questions (your diagnosis) or make a more informed decision (provide medical prescription), it is of NO USE.
I also like to reiterate here that merely having data does not mean one can do Data Science immediately, there needs to be a connection between data collected and business question i.e. data can be refined and used to answer business question. And to be able to see that, a pair of experienced eyes is usually needed because it is never so straightforward.
If you are planning on building up your capabilities, check out these posts.
1. Data Science Maturity Stages for Business
2. Building Data Science Capabilities in Business
3. Data Science is a Journey, not an End
To stay in touch, feel free to reach out on my LinkedIn!