How to maintain Model Effectiveness after deployed
When we ready to deploy a good predictive model which given a good accuracy score on training and testing dataset, there is one more problem to solve. How long will this model effectively solve the problem with the same high accuracy and what is the strategy to maintain the model accuracy. We also need to know what action we need to take when the model’s effectiveness on the decline trend. In this article, I am sharing my strategy for validating and maintaining the predictive model effectiveness. We can two things we can take before deploying a model in production
First, if possible add a low percentage of negative test data as part of model results. Negative testing is a method of testing an application that ensures that the model handles the unwanted input the way it should be. For example, for a recommendation model that recommends potential customers from large customer datasets to call for a marketing purpose based on the customer. In this model, including a low recommendation score along with high probabilities customers will help us to validate the model effectiveness. A good model will have high accuracy for positive data at the same time it should give a low score for negative ingested data. The high success rate from a negative dataset is a good indication that the current model’s training data should be re-evaluated. When we get good accuracy on the negative data or low accuracy on the data, (expect the negative sample) will be an indication that the model has some problems.
Second, I would recommend developing an autoencoder (AE) model with training data our deployed model. I highly recommend developing this AE model before we deploy the model into production. Using anomaly detection techniques, pass the recent model input data to get the reconstruction error value. A high reconstruction error will indicate that the input deviates from the original training data.
Based on business objective, from recent model results, get the accuracy score on positive data, and negative data, and get reconstruction error value from the AE model. With these three values, we can evaluate the effectiveness of the model. Let us see the possible actions we can take related to model effectiveness
High Reconstruction Error |
Low Reconstruction Error |
|||
High Accuracy Negative Test |
High Accuracy Negative Test |
Low Accuracy Negative Test |
Low Accuracy Negative Test |
|
High Model Accuracy |
Retrain with new data |
Retrain with new data |
No Action Needed |
No Action Needed |
Low Model Accuracy High |
Redevelop the model from scratch |
Redevelop the model from scratch |
Tune the model with new features |
Tune the model with new features |
Actions 1: Retrain the model with new data
When the model produced high accuracy score and negative test accuracy is also significantly high, get the reconstruction error value from the AE model with recent data. If the reconstruction error value is low the data is not changed much. The next best action will be retraining the model with recent data.
Actions 2: Retrain the model with additional features
When the model produced a low accuracy score on both positive and negative dataset accuracy is significantly high. , the reconstruction error value is low that indicates that data is not changed much but we need to tune the model with new features. In this case, you already have the data you need, but retraining the same model on the same data is not going to help. The action would be develop new features by doing the feature-engineering step and retrain the model with additional features. Remember to preserve the features from the original model.
Actions 3: Develop a new model from scratch
When the model produced a low accuracy score on recent positive data and negative test accuracy is significantly high. Furthermore, the reconstruction error value is also high for recent input data is a clear indication that the new recent input data is much different and has new features than the model originally trained. The next nest action would be repeating the process of feature extraction, then build, and train a new model from scratch.
Final thoughts
· How often we needed to validate the model is depends on the frequency of model consumed and the rate the base data will change over time. Understanding the business problem will help to determine the frequency of validating the model. When deploy the model, having a plan to validate the model will be good practice.
· Even though getting feedback directly from users and incorporate them into the model is great but in practice, getting timely feedback is hard and challenge to implement auto-tune the model. Try to find the model results from the business result, instance a call (call-turned-to-customer) success rate on recommendation model from the call history data than solely depend on the users’ feedback.
· In practice, ingesting negative test data into the model is not an option for many business problems. In those scenarios, the AE model and average accuracy score from the model in a specific period can be used to validate the effectiveness of the model.
ThTThanks