Recently we read a lot about fake news, alternate facts and journalism lies. Companies like Facebook develop data science algorithms to detect these postings, based among other things on crowd sourcing (collective intelligence.)
But can the data scientist, with her inquisitive mind and strong sense of numbers and probabilities, use her brain to assess how true a piece of information is? I am talking here about fuzzy logic, and human rather than artificial intelligence to determine the probabilities.
Here is a recent, popular example: the Firefall in the Yosemite National Park (California), pictured below.
It is supposed to be a rare natural event occurring only in February under certain conditions, according to National Geographic. But there is also a famous, artificial firefall that took place a few miles away each year until 1968, and it was man-made (people throwing embers in the water atop the cliff)
As a data scientist, my first reaction is to assess a probability that the natural firewall is indeed genuine.
- Probability for two such unrelated events (man-made and natural) to occur so close to each other, assuming both occurred: Maybe 1/10,000?
- Probability for the two events, to look so similar: Maybe 1/10?
- Probability for any one event to occur anywhere in a waterfall: Maybe 1/1,000,000? (who has ever seen a natural firefall?)
- Could this phenomenon be replicated in a laboratory or simulated on a computer? Probability for the sun to create that kind of glow on water?
Of course, this is balanced by the fact that journalists would report such an event only if it is extremely rare – a one in a million chance. That puts the odds of being real, according to my very wild guess, at 1,000,000 / (10,000 x 10 x 1,000,000) = 1 / 100,000.
My figures look more like an answer to a Microsoft job interview question. But it leads to some interesting question: What if the truth is somewhere in-between? What if the picture truly features a genuine, natural event, but the colors were altered or maybe it was once again a man-made event that the journalist was unaware of? How do you assign a probability to the fact that
- The picture is real, unaltered
- The picture is real, colors are exaggerated
- The picture is real, unaltered, but the explanation is incorrect, or maybe totally wrong
- This is fake, maybe a picture created using some software, or some other artifact, inspired by the old man-made firewall, and created on purpose to go viral
Just food for thoughts. I don’t have an answer, as I haven’t spent enough time investigating this. But for about 30% of the news that I read in reputable outlets (and 95% of Facebook content) I am asking myself the same question.