What Is Overfitting?
Telltale Super-Accuracy on Training Data
When machine learning models show exceptional accuracy on training data sets, but perform poorly on new, unseen data, they are guilty of overfitting. Overfitting happens when models “learn” from noise in data instead of from true signal patterns.
How to Avoid Overfitting
Detecting overfitting is the first step. Comparing accuracy against a portion of training that was data set aside for testing will reveal when models are overfitting. Techniques to minimize overfitting include:
- Tuning Hyperparameters – Hyperparameters are descriptions of data set properties—information about the data, not the data itself. Hyperparameters can be used to adjust settings for different families of machine learning algorithms so they perform well and do not overfit.
- Cross-Validation – Cross-validation splits training data into additional train-test sets to tune hyperparameters iteratively, without disturbing the initial test set-aside data.
- Early Stopping – Machine learning algorithm training generally improves model performance with more attempts—up to a point. Comparing model performance at each building iteration and stopping when accuracy no longer improves prevents overfitting.
Squark Seer automatically employs these and other approaches to minimize overfitting. As always, get in touch if you have questions about Overfitting or any other Machine Learning topic. We’re happy to help.
Judah Phillips