Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

What is Statistical Learning

Example : If we determine that there is an association between advertising and sales, then we can adjust advertising budgets, thereby indirectly increasing sales. In other words, our goal is to develop an accurate model that can be used to predict sales on the basis of the three media budgets (TV, Newspaper, Radio)

So we try to model relationship between Y (output variable – sales) and X = (X1,X2, . . .,Xp) (predictor / input variables) , which can be written in the very general form Y = f(X) + ε where ε is the error term and f is some fixed but unknown function of X1, . . . , Xp

In general, the function f may involve more than one input variable. Statistical learning refers to a set of approaches for estimating f. Why estimate f ? For Prediction and Inference.


In situations where X is available and Y is not available, since the error term averages to zero, we can predict Y
Y^ = f^(X) where f^ represents our estimate for f, and Y^ represents the resulting prediction for Y


Understanding the way that Y is affected as X1, . . . , Xp change.

  • Which predictors are associated with the response?
  • What is the relationship between the response and each predictor?
  • Can the relationship between Y and each predictor be adequately summarized using a linear equation, or is the relationship more complicated

Depending on whether our ultimate goal is prediction, inference, or a combination of the two, different methods for estimating f may be appropriate. Eg Method “A” may be simple and more interpretable but might not provide an accurate prediction.

The approaches are depicted below.


How to estimate f ?
Most statistical learning methods can be characterised as

  • Parametric
  • Non Parametric

Parametric Method (model based method)
2 step approach

  1. Assume a functional form or shape of f. Eg f is linear -> Linear Model
  2. procedure that uses the training data to fit or train the model (most commonly – least squares method)

That is, it reduces the problem of estimating f to a set of parameters. But the disadvantage is that the model we choose will may match the true function f. Then the estimate will be poor.  To address this problem we may try to have mode flexible models (more parameters) but this may lead to overfitting.

Non Parametric Method

No form is assumed. They seek an estimate of f as close to the points as possible. So wide range of possible shapes for f. The disadvantage is since the problem is not simplified to parameters, a large number of observations are needed to get an accurate estimate for f. Eg A thin-plate spline is used to estimate f.

Typically when doing Inference, simple models are preferred to ease interpretation. When doing prediction, interpretation is not so important so its OK to use flexible models.

Regression Vs Classification

Problems with Quantitative Responses are referred as regression problems and problems with Qualitative responses are called classification problems

Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

References for these notes :

The study material for the MOOC “Statistical Learning” at Stanford Online

The book “Introduction to Statistical Learning”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: