Most of us, unless we’re insurance actuaries or Wall Street quantitative analysts, have only a vague notion of algorithms and how they work. But they actually affect our daily lives by a considerable amount. Algorithms are a set of instructions followed by computers to solve problems. The hidden algorithms of Big Data might connect you with a great music suggestion on Pandora, a job lead on LinkedIn or the love of your life on Match.com.
These mathematical models are supposed to be neutral. But former Wall Street quant Cathy O’Neil, who had an insider’s view of algorithms for years, believes that they are quite the opposite. In her book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, O’Neil says these WMDs are ticking time-bombs that are well-intended but ultimately reinforce harmful stereotypes, especially of the poor and minorities, and become “secret models wielding arbitrary punishments.”
Models and Hunches
Algorithms are not the exclusive focus of Weapons of Math Destruction. The focus is more broadly on mathematical models of the world — and on why some are healthy and useful while others grow toxic. Any model of the world, mathematical or otherwise, begins with a hunch, an instinct about a deeper logic beneath the surface of things. Here is where the human element, and our potential for bias and faulty assumptions, creeps in. To be sure, a hunch or working thesis is part of the scientific method. In this phase of inquiry, human intuition can be fruitful, provided there is a mechanism by which those initial hunches can be tested and, if necessary, corrected.
O’Neil cites the new generation of baseball metrics (a story told in Michael Lewis’s Moneyball) as a healthy example of this process. Moneyball began with Oakland A’s General Manager Billy Beane’s hunch that using performance metrics such as runs batted in (RBIs) were overrated, while other more obscure measures (like on base percentage) were better predictors of overall success. Statistician Bill James began crunching the numbers and putting together models that Beane could use in his decisions about which players to acquire and hold onto, and which to let go.
While sports enthusiasts love to debate the issue, this method of evaluating talent is now widely embraced across baseball, and gaining traction in other sports as well. The Moneyball model works, O’Neil says, for a few simple reasons. First, it is relatively transparent: Anyone with basic math skills can grasp the inputs and outputs. Second, its objectives (more wins) are clear, and appropriately quantifiable. Third, there is a self-correcting feedback mechanism: a constant stream of new inputs and outputs by which the model can be honed and refined.
Read more, here.
DSC Resources
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Additional Reading
- What statisticians think about data scientists
- Data Science Compared to 16 Analytic Disciplines
- 10 types of data scientists
- 91 job interview questions for data scientists
- 50 Questions to Test True Data Science Knowledge
- 24 Uses of Statistical Modeling
- 21 data science systems used by Amazon to operate its business
- Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)
- 5 Data Science Leaders Share their Predictions for 2016 and Beyond
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 22 tips for better data science
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- High versus low-level data science
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge