Reviving from the dead an old but popular blog on Understanding Type I and Type II Errors
I recently got an inquiry that asked me to clarify the difference between type I and type II errors when doing statistical testing. Let me use this blog to clarify the difference as well as discuss the potential cost ramifications of type I and type II errors. I have also provided some examples at the end of the blog[1].
In statistical test theory, the notion of statistical error is an integral part of hypothesis testing. The statistical test requires an unambiguous statement of anull hypothesis (H0), for example, “this person is healthy”, “this accused person is not guilty” or “this product is not broken”. The result of the test of the null hypothesis may be positive(healthy, not guilty, not broken) or may be negative(not healthy, guilty, broken).
If the result of the test corresponds with reality, then a correct decision has been made (e.g., person is healthy and is tested as healthy, or the person is not healthy and is tested as not healthy). However, if the result of the test does not correspond with reality, then two types of error are distinguished: type I errorand type II error.
Type I Error (False Positive Error)
A type I error occurs when the null hypothesisis true, but is rejected. Let me say this again, atype I error occurs when the null hypothesis is actually true, but was rejected as falseby the testing.
A type I error, or false positive, is asserting something as true when it is actually false. This false positive error is basically a “false alarm” – a result that indicates a given condition has been fulfilled when it actually has not been fulfilled (i.e., erroneously a positive result has been assumed).
Let’s use a shepherd and wolf example. Let’s say that our null hypothesis is that there is “no wolf present.” A type I error (or false positive) would be “crying wolf” when there is no wolf present. That is, the actual conditionwas that there was no wolf present; however, the shepherd wrongly indicated there was a wolf present by calling “Wolf! Wolf!” This is a type I error or false positive error.
Type II Error (False Negative)
A type II error occurs when the null hypothesis is false, but erroneously fails to be rejected. Let me say this again, atype II error occurs when the null hypothesis is actually false, but was accepted as trueby the testing.
A type II error, or false negative, is where a test result indicates that a condition failed, while it actually was successful. A Type II error is committed when we fail to believe a true condition.
Continuing our shepherd and wolf example. Again, our null hypothesis is that there is “no wolf present.” A type II error (or false negative) would be doing nothing (not “crying wolf”) when there is actually a wolf present. That is, the actual situationwas that there was a wolf present; however, the shepherd wrongly indicated there was no wolf present and continued to play Candy Crush on his iPhone. This is a type II error or false negative error.
A tabular relationship between truthfulness/falseness of the null hypothesis and outcomes of the test can be seen in the table below:
Null Hypothesis is true |
Null hypothesis is false |
|
Reject null hypothesis |
Type I Error False Positive |
Correct Outcome True Positive |
Fail to reject null hypothesis |
Correct outcome True Negative |
Type II Error False Negative |
Examples
Let’s walk through a few examples and use a simple form to help us to understand the potential cost ramifications of type I and type II errors. Let’s start with our shepherd / wolf example.
Null Hypothesis |
Type I Error / False Positive |
Type II Error / False Negative |
Wolf is not present |
Shepherd thinks wolf is present (shepherd cries wolf) when no wolf is actually present |
Shepherd thinks wolf is NOT present (shepherd does nothing) when a wolf is actually present |
Cost Assessment |
Costs (actual costs plus shepherd credibility) associated with scrambling the townsfolk to kill the non-existing wolf |
Replacement cost for the sheep eaten by the wolf, and replacement cost for hiring a new shepherd |
Note: I added a row called “Cost Assessment.” Since it can not be universally stated that a type I or type II error is worse (as it is highly dependent upon the statement of the null hypothesis), I’ve added this cost assessment to help me understand which error is more “costly” and for which I might want to do more testing.
Let’s look at the classic criminal dilemma next. In colloquial usage, a type I error can be thought of as “convicting an innocent person” and type II error “letting a guilty person go free”.
Null Hypothesis |
Type I Error / False Positive |
Type II Error / False Negative |
Person is not guilty of the crime |
Person is judged as guiltywhen the person actually did notcommit the crime (convicting an innocent person) |
Person is judged not guiltywhen they actually didcommit the crime (letting a guilty person go free) |
Cost Assessment |
Social costs of sending an innocent person to prison and denying them their personal freedoms (which in our society, is considered an almost unbearable cost) |
Risks of letting a guilty criminal roam the streets and committing future crimes |
Let’s look at some business related examples. In these examples I have reworded the null hypothesis, so be careful on the cost assessment.
Null Hypothesis |
Type I Error / False Positive |
Type II Error / False Negative |
Medicine A cures Disease B |
(H0 true, but rejected as false) Medicine A curesDisease B, but is rejected as false |
(H0 false, but accepted as true) Medicine A does not cureDisease B, but is accepted as true |
Cost Assessment |
Lost opportunity cost for rejecting an effective drug that could cure Disease B |
Unexpected side effects (maybe even death) for using a drug that is not effective |
Let’s try one more.
Null Hypothesis |
Type I Error / False Positive |
Type II Error / False Negative |
Display Ad A is effective in driving conversions |
(H0 true, but rejected as false) Display Ad A is effective in driving conversions, but is rejected as false |
(H0 false, but accepted as true) Display Ad A is not effective in driving conversions, but is accepted as true |
Cost Assessment |
Lost opportunity cost for rejecting an effective Display Ad A |
Lost sales for promoting an ineffective Display Ad A to your target visitors |
The cost ramifications in the Medicine example are quite substantial, so additional testing would likely be justified in order to minimize the impact of the type II error (using an ineffective drug) in our example. However, the cost ramifications in the Display Ad example are quite small, for both the type I and type II errors, so additional investment in addressing the type I and type II errors is probably not worthwhile.
Summary
Type I and type II errors are highly depend upon the language or positioning of the null hypothesis. Changing the positioning of the null hypothesis can cause type I and type II errors to switch roles.
It’s hard to create a blanket statement that a type I error is worse than a type II error, or vice versa. The severity of the type I and type II errors can only be judged in context of the null hypothesis, which should be thoughtfully worded to ensure that we’re running the right test.
I highly recommend adding the “Cost Assessment” analysis like we did in the examples above. This will help identify which type of error is more “costly” and identify areas where additional testing might be justified.
[1]More information about type I and type II errors can be found at: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors