bullseye or feature leakage

Do your models seem too accurate? They might be.

Feature leakage, a.k.a. data leakage or target leakage, causes predictive models to appear more accurate than they really are, ranging from overly optimistic to completely invalid. The cause is highly correlated data – where the training data contains information you are trying to predict.

How to Minimize Feature Leakage:

  1. Remove data that could not be known at the time of prediction.
  2. Perform data cross-validation.
  3. If you suspect a variable is leaky, remove it and run again.
  4. Hold back a validation data set.
  5. Consider near-perfect model accuracy a warning sign.
  6. Check variables of importance for overly predictive features.

If you are a Squark user, you’ll be happy to know that our AutoML identifies and removes highly correlated data before building models. Squark uses cross-validation and holds back a validation data set as well. Squark always displays accuracy and variables of importance for each model.

Recent Posts

GPT and Predictive Analytics AI
navigating ai

Squark is a no-code AI as a Service platform that helps data-literate business users make better decisions with their data. Squark is used across a variety of industries & use cases to uncover AI-driven insights from tabular and textual data, prioritize decisions, and take informed action. The Squark platform is designed to be easy to use, accurate, scalable, and secure.

Copyright © 2023 Squark. All Rights Reserved | Privacy Policy