In the article, Data Science Should Monitor Big Brother by Arjan Haring, one important overlooked point here is that significant value, even from the point of view of the individual, is created when sellers have a better idea of what specific buyers want. Many, if not most, people don’t mind seeing recommendations on new books to read or movies to see, based on their own prior selections and what other similar buyers buy. And most people would prefer to see relevant ads, as opposed to irrelevant ones, and the creepy factor tied to remarketing ads that follow you around is diminishing simply because we see this practice everywhere. So the value is not all asymmetrically on the side of the seller.
Lack of understanding of predictive models (and at least an intuitive grasp of Bayes rule) does create real problems. Predictive modeling is used in many situations where the class of interest (tax cheats, fraudulent insurance claims, likely criminal) is rare. While the data science model may do much better than random guessing at the cases of interest, it is often true that the cases flagged as “of interest” are more likely than not to be innocuous. If the prevalence of fraud is 1%, and a data mining model can quintuple the identification rate to 5%, it’s a very successful model in terms of lift, but 95% of its flagged cases are still wrong. If authorities place too much stock in their models, real harm can be done.
Perhaps the greatest scope for restoration of symmetry, or at least a rebalancing, may be the growth of intermediary sites like Yelp and Expedia, which stand between consumer and producer, have nothing but data at their disposal, and ultimately have to rely on providing real value to both sides of the equation.