We tried to do XYZ. Did it make a difference?”
Whether you are in the for-profit world or the not-for profit world, this is a very basic question that many people try to answer.
You could be working at a bank trying to figure out which offer is most appealing to customers, at an online retailer figuring out which ad display gets the most clicks, at the Department of Education trying to test the effect of smaller class sizes, at the city government office trying to see if the new bike lane programs really are safer, at an online media provider (Netflix, Youtube, Spotify,..) trying to find the best algorithm for making recommendations, a pharmaceutical company is running a clinical trial comparing their drug’s effectiveness versus the competitor, a pharmaceutical company wants to see if its newly released drug has strong impact in the real world…
All of these examples have a few things in common.
- A change was made whether it be a new algorithm, a different ad display, new bike lanes, smaller class sizes or testing a new drug treatment. These are all examples of Programs.
- There is a desire to determine if this change made a difference in an outcome of interest whether that outcome is more sales, safer roads, better educated students or more patients being cured. These are all examples of Program Evaluation where the goal is to find out not only was there a change in the outcome of interest but to be able to say that the program caused the difference. As part of this Program Evaluation, we want to know the direction of the change (did it make things better or worse?, the magnitude of the change, and, in some cases, whether the change was statistically significant.
Economists and Evaluation Specialists (those with degrees in Monitoring and Evaluation) study many techniques to do program evaluation including Randomized Experiments as well as Quasi-Experimental Methods such as Propensity Score Methods, Instrument Variable, Interrupted Time Series, Regression Discontinuity, Heckman’s 2-Stage Model…
Most data scientists can do A/B Testing like there is no tomorrow. It is a standard part of the Data Scientists toolkit. When successful, the A/B Testing creates a random assignment so that the two groups, A and B, are, on average, very similar in all observable and unobservable characteristics. The program evaluation then simply consists of checking the quality of the randomization (yes, this step get skipped by many people but, it should not be skipped) then comparing the outcomes in Group A to Group B. This is like the way a clinical trial is designed and implemented for a drug.
But what if the randomization failed? What if the groups are different? What if other experiments were going on at the same time that impacted the assignment?
What if randomization is not possible?
In these situations, the toolbox of Program Evaluation becomes critical to determining if the program made a difference in the outcome of interest, whether that be higher click-through rates, increased sales, safer roads, more effective drugs or better education.
The desired skills for a Data Scientist already include quite a long list. Knowing that we can’t add an infinite number of required skills to the Data Scientist Toolbox, what do you think about a basic course in Program Evaluation? Would some training in Program Evaluation be helpful to round out a Data Scientists training?
Interested in your insights on this topic. #datascience #programevaluation