Keeping ML Models on Target by Managing Drift
In a world perfectly optimized for artificial intelligence and machine learning, there would be no change. Data would be static, models would be static, external conditions would be static, and ML model output would always be accurate. In our world, however, very little is static; change is the only constant. Conditions vary over time, even between model training and deployment: data values shift, technology evolves, unexpected events occur, and human behavior fluctuates. This dynamic, sometimes erratic environment gives rise to model drift, causing ML models to lose accuracy and functionality rather than improving over time.
What is model drift?
Model drift is the deterioration of an ML model’s predictive power over time. Drift can occur in the input data, in the relationships between variables, or in the external conditions in which the model operates. It can emerge unpredictably or follow a pattern. There are a number of ways to categorize drift and the nomenclature is inconsistent, but it is broadly classified as the following:
- Data drift/covariate shift: Change in the data distribution of the input or independent variable. The correlations and output of the model may still be technically correct, but changes in demographics or input data have rendered the model’s predictions incomplete or unreflective of actual outcomes.
- Concept drift: Change in the relationship between the input and output variables. Here, although the input data may be largely unchanged, the model’s assumptions are now inaccurate due to shifts in human behavior or external conditions.
Changes in model performance may be gradual (e.g., tied to macroeconomic conditions or advances in technology), sudden (e.g., due to unpredictable external events such as the covid-19 pandemic), or recurring (due to seasonal shifts or periodic events).
What causes model drift?
Data Quality and Integrity Issues
Model drift is caused by a number of factors. The most of fundamental of these is data quality and integrity. If the dataset used to train a model is incomplete, incorrectly entered, or poorly representative of production data (due to differing image quality, inconsistent formats, data entry bugs, or labeling issues), the model’s performance will suffer. For example, if a model used to identify possible fractures in x-ray images is trained using clean, clear hospital x-ray films, the model may produce inaccurate results when presented with images from different x-ray machines, smartphone images, poorly lit photographs, or scratched or smudged films.
Demographic Shifts
Another factor that causes a discrepancy in model outcomes is demographic shifts. For example, a model trained to predict the viewing preferences of streaming movie users before the covid-19 pandemic may still be excellent at providing suggestions for young and middle-aged adults, the primary pre-pandemic audience. However, now that viewership has expanded to include large proportions of young children and senior citizens, the predictions and suggestions may no longer be relevant to the audience as a whole.
Changes in Human Behavior
A third major cause of drift is changes in human behavior. This is typically a result of external shifts like the pandemic, changes in interest rates, salary increases and price drops, or even road construction. For example, if a model is used to predict traffic on a particular highway at different times of day, its calculations will be rendered inaccurate if that highway is blocked at certain times due to construction. In this case, drivers would change their route or schedule to avoid the blockages, and the patterns learned by the model would no longer hold true.
How can model drift be detected?
Monitoring ML models for drift on an ongoing basis is essential. This can be done via a number of algorithmic checks and quantitative methods that detect differences or shifts between training data and real-time data. Some examples include:
- Kolmogorov-Smirnov (K-S) Test – a nonparametric test that compares two datasets and flags changes in data distribution.
- Population Stability Index – compares the distribution of the target variable in the training data and the deployment data.
- Adaptive Windowing (ADWIN) – a specialized drift-detection algorithm that uses sliding windows to compare sequential segments of fresh data against previous data segments.
- Page-Hinkley Method – calculates the mean value of new data and compares it to earlier mean values. It raises a flag if the difference in values crosses a set threshold.
- Model-Based Approach – uses an ML model to discriminate between training data and test data and measure their similarity.
What are some best practices for managing model drift?
With model drift, early detection and action are key; the longer the delay, the further the model’s predictions will deviate from the actual values, making the model less viable, until it may need to be rebuilt entirely. If drift is detected between model training and deployment, the first assessment should be of data quality. The data used to train the model must be accurately labeled, current, and complete. After verifying data quality and integrity, one method of managing drift is to proactively retrain an ML model on new data according to a predetermined schedule (e.g., at one-month intervals), based on observed timelines of model degradation. Another method is to perform reactive, targeted retraining, intervening only when drift detection mechanisms trigger a warning. Based on the structure and function of a model, there are a number of ways to update it with new data, including the following:
- Online learning or incremental learning – continuously updates the model, one sample at a time, as new data enters the system. This works well in applications that use streaming data.
- Periodic retraining – uses fresh data to retrain the model on a periodic basis. This can be resource-intensive because it may require labeling large sets of new data.
- Retraining based on a representative subsample – retrains the model using a small, representative subsample of new data. This is a scaled-down, cost-effective version of the above.
- Ensemble learning – uses multiple models to give a combined prediction based on individual model performance. These individual models will respond differently to changes in the input data, and the stronger results will be given more weight in the combined output.
- Weighted data – weights data differentially according to age, with newer data given more importance than older data. This allows previous data to be retained rather than discarded when the model is retrained, producing a more robust model.
- Feature dropping – uses a set of models that addresses one feature at a time with a common target variable. Each model/feature is assessed separately for predictive accuracy, and if drift is detected, the drifting feature can be dropped.
Conclusion
Since most ML models operate in a highly dynamic setting, model training and deployment are by no means ends in themselves. First and foremost, models must be trained and retrained using accurately labeled, carefully selected data. Next, in order to keep up with real-time changes in data distributions and environmental conditions, models must be continuously monitored to maintain their predictive power and counteract drift. They must be refitted with fresh data on an ongoing basis, either at predetermined intervals or in response to drift detection alerts. End-to-end AI solution providers can facilitate this process by supplying high-quality, up-to-date datasets for model training and retraining. They can also automate the monitoring and retraining process so that businesses can ensure reliable, seamless functioning of their ML models and generate the highest returns. In this way, ML models can avoid the losses and failures associated with drift and consistently perform at their peak.
Accelerate AI with Annotated Data
Check Out this Article on Why Your Model Performance Problems Are Likely in the Data
follow us