Datafication. What is it? Wikipedia describes it as “a modern technological trend turning many aspects of our life into computerized data and transforming this information into new forms of value.”
Datafication is happening all around us and just 10 years ago it wouldn’t have been an idea. For example, consumers want data about their heart rate, daily steps, water consumption and calorie intake. Big businesses have embraced datafication because it allows them to track consumer habits such as available pantry space or the amount of milk in a refrigerator.
Benefits of datafication are it gives us insight to make better personal life choices. It helps businesses become leaner and less wasteful. It even has the ability to predict famine and the spread of diseases. This new technological trend is truly a wonderful thing!
Until it isn’t . . . There are hidden dangers in datafication as well.
In the data world, we have a phrase we refer to as “Garbage-In, Garbage-Out (GIGO)”. The idea behind GIGO is that incorrect or low quality input will result in incorrect or low quality output. We can only trust the output if we trust the input.
So how bad can GIGO be? Can it really cause major problems?
Let’s look at two examples. One mechanical and the other human.
I was hired as a data analysis contractor for a retail client that tracked store attendance with automated people counters.
Their data was collected each time a beam was crossed. During my initial consulting session, they spoke highly of the counters and called them “Random Number Generators”.
The company trusted in the counters and based their entire financial planning on the numbers. This was a major flaw in their business and no one took the time to reflect about the ramifications of the faulty data. It was wildly inconsistent and would do anything from duplicating to completely ignoring entries and exists.
The second example may be a little touchy so I apologize in advance.
On November 1, 2016, one week before the United States would go to the polls to elect their 45th President, almost all major polls showed Hillary Clinton with a 3-6 point lead. Just by the raw data indicated she had an insurmountable advantage. Clearly Clinton and her team trusted in the numbers and that was one of many mistakes. Not only did she lose, but was embarrassed by not having a concession speech.
So what happened? Datafication happened. People believed in the numbers without taking caution into how the data was collected. In this instance, the phenomenon was caused by Social Desirability Bias and that’s outside the scope of this post (look it up though). Resulting in the data source being flawed. Garbage-In, Garbage-Out.
The data damage didn’t just end with embarrassment for Clinton. The impact had a trickle-down effect. The people who generated those reports (analogous to us as BI developers and report writers) took major flack for incorrect data. Companies saw their stock prices drop and resulted in layoffs and government policy changes. Additionally, the faulty numbers caused a shock on the American public. This is all because of reliance on data they assumed to be properly sourced.
You can have the most beautiful visualization, complete with great context, relevancy, etc. Yet, if the source is tainted, it’s useless. So in this age of datafication, take caution to always challenge and check the authenticity of the source. The few minutes upfront will be worth it in the end.