Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Overfitting

Today, your model—just like your jeans—seems to “hug” your sample data perfectly. But you want your jeans to fit a year or so down the road.

Eg of overfit model

Screenshot_120213_094911_PM

But if we collect new data and this is what it looks like then our model wont fit anymore.

Screenshot_120213_095216_PM

So, to take care of this, we have to have a better model right at the beginning

Screenshot_120213_095544_PM

R sq tells you how well your model fits your sample data. Predicted R sq, aka R sq (pred), tells you how well your model predicts responses for new data. For both of these stats, 100% represents a perfect fit.

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

References for these notes :

http://blog.minitab.com/blog/statistics-and-quality-data-analysis/overfit-those-skintight-jeans-fit-perfect-when-you-bought-them-but

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: