Models are simplification or approximation of reality and hence they will not capture all of reality. “All models are wrong, but some are useful” is a famous quote by George Edward Pelham Box (1919–2013). George Box was a British mathematician and professor of statistics at the University of Wisconsin. Statisticians develop theoretical models to predict the behaviour of certain process. The meaning of this quote is that every single model will be wrong and it never represents the exact behaviour. But, even if the model cannot describe exactly the reality, it could be useful if it is close enough. In this, article let us examine how the quote of George Box applies to Machine Learning models.
Model development in Machine Learning (ML) is an iterative process and it is totally different from statistical model development.The methods used in statistical model development is supported by theory and predictions made by the models will be independent of the model developer. In Machine Learning, modelling is an art and the results produced the ML models depends on the artist, the model developer. In machine learning, we have different category of professionals engaged in model development.
They are Software engineers, Machine learning engineers, and Data scientists. Each category of these professionals looks at model development from a different perspective. Data scientists never satisfy with the predictions made by the models and think that still there is scope for improving the accuracy. But, software engineers and machine learning engineers gets satisfied easily and tend to believe that the results generated from their models are having highest accuracy and there is no scope for further improvement.
The accuracy of Machine Learning models also depends on the algorithm used. In machine learning, we have to choose one from half a dozen algorithms /libraries. The tyical machine learning algorithms/ libraries are XGBoost, Random Forests, TensorFlow, Scikit-learn, and Pytorch. In machine learning, we have scope for applying human creativity, intuition, experience and domain knowledge to improve the accuracy of the models. Hence the accuracy of machine learning models also depends on the professional who developed the model. The model developers influence the model by tuning Hyperparameters. Hyperparameters are user-defined settings that dictate how an algorithm should behave during training. Selecting the right hyperparameter values for your machine learning model is important because it can have an enormous impact on final accuracy and performance.
In Machine Learning context, it will be more appropriate to modify the quote as “All models are accurate, but some are more accurate”. Wish you happy modelling. See you next time.