Hello fellow Data Science-Centralists!
I wrote a post on my LinkedIn about why you should NEVER run a Logistic Regression. (Unless you really have to).
The main thrust is:
- There is no theoretical reason why a least squares estimator can’t work on a 0/1.
- There are very very narrow theoretical reasons that you want to run a logistic, and unless you fall into those categories it’s not worth the time.
- The run time of a logistic can be up to 100x longer than an OLS model. If you are doing v-fold cross-validation save yourself some time.
- The XB’s are exactly the same whether you use a Logistic or a linear regression. The model specification (features, feature engineering, feature selection, interaction terms) are identical — and this is what you should be focused on anyways.
- Myth: Linear regression can only run linear models.
- There is *one* practical reason to run a logistic: if the results are all very close to 0 or to 1, and you can’t hard code your prediction to 0 or 1 if the linear models falls outside a normal probability range, then use the logistic. So if you are pricing an insurance policy based on risk, you can’t have a hard-coded 0.000% prediction because you can’t price that correctly.
See video here and slides here.
I think it’d be nice to start a debate on this topic!