Predictive Analytics can be very helpful: algorithms can help retailers decide what quantities of items to order, detect instances of financial fraud or help allocate resources to control the spread of a disease. However, while these predictive tools are helping change lives for the better and provide meaningful insights, we must remember not to let the algorithms do all the heavy lifting – like the vision from a crystal ball, the images provided can be open to interpretation.
The largest and most recent example comes from our neighbours to the south. The recent U.S. election was preceded by numerous media reports on polls suggesting Hillary Clinton was the absolute favourite to win the presidency, leading many to be shocked when the results rolled in on the evening of November 8th.
Erik Brynjolfsson, Director of the MIT Initiative on the Digital Economy, soberly reminds us that data science is not always going to give you an answer, but a probability – and a 70% chance that someone is going to win still means there is an additional 30% chance that they won’t. The algorithms that helped form the predictions from this election were based more on historical election data and less on the emotional mood of the country, and in a 24-hour news cycle, raw data suggesting a Hillary win became constant news reports that may have emboldened Democratic supporters to the point of complacency.
After the election there was a lot of soul-searching by data scientists: was the data itself flawed? The polls? The algorithms? There were certainly data scientists who correctly predicted a Trump win, but when the predictive algorithms by the major vote forecasters formed a general consensus that jived with the story the left-leaning media wanted to tell, it was not scrutinized nearly as well as it could have been.
With several very public big data algorithm stories in the news including Google’s overinflated flu-trends analysis back in 2013 and Facebook’s promotion of fake news in their trending section, people are taking notice. Data and predictions based off of algorithms are important, but there is a lot to learn from these recent news stories; we must acknowledge that predictive algorithms often fail to pick up on important details relating to context that are necessary when it comes to interpretation and actioning of the results. Data on its own might seem to tell one picture, but without having a team take the time to analyze the data for potential outliers and bias and present it within a larger context, the results gleaned from the data can be troublesome.
For example, predictive algorithms for police are meant to help determine areas where crimes are more likely to be committed to send officers on patrol. However, when one such system in Oakland was audited by a third-party, it was shown that the software displayed an inherent racial bias reinforcing already prevalent racial stereotypes related to law enforcement:
PredPol directed police to black neighbourhoods like West Oakland and International Boulevard instead of zeroing in on where drug crime actually occurred. Predominantly white neighbourhoods like Rockridge and Piedmont got a pass, even though white people use illicit drugs at higher rates than minorities.
Algorithms are programmed by people, and people aren’t perfect. The promise of machine learning to tweak and improve algorithms is promising, but still in its infancy; consider Microsoft’s recent attempt at a learning chatbot which became worryingly racist after it was spammed by Twitter trolls. Yet for all of the negative stories, there are some amazing ones as well, such as a successful Canadian Predictive Analytics system that tracks the progress of HIV to determine where to best focus health resources and slow the spread of the disease. All it takes is for the right people to understand the nuances of the data.
As we give more control of our decision making over to algorithms and AI, we must remember that while these tools are valuable and continually being refined, they far from perfect. While it may be tempting to take their findings at face value, it still requires sensible analysis from human beings to make heads or tails of the data in relation to all the factors that influenced it.