Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Digital Signing of objects

CA-CSR flow

How HTTPS works



  1. Browser connects to a web server secured with SSL
  2. Server sends a copy of its SSL Certificate, including the server’s public key
  3. Browser (mostly) already has a list of CA certificates with it. It validates the server certificate root against a list of trusted CAs. If OK the browser trusts the certificate, it creates, encrypts, and sends back a symmetric session key encrypted using the server’s public key
  4. Server decrypts the symmetric session key using its private key and sends back an acknowledgement encrypted with the session key to start the encrypted session.
  5. Server and Browser exchange data encrypted with the session key


Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

References for these notes :

Conditional Probability, Bayes Theorem, Naive Bayes Classifier

Both kNN and NaiveBayes are classification algorithms. Conceptually, kNN uses the idea of “nearness” to classify new entities. In kNN ‘nearness’ is modeled with ideas such as Euclidean Distance or Cosine Distance. By contrast, in NaiveBayes, the concept of ‘probability’ is used to classify new entities.

Before someone can understand and appreciate the nuances of Naive Bayes’, they need to know a couple of related concepts first, namely, the idea of Conditional Probability, and Bayes’ Rule. (If you are familiar with these concepts, skip to the section titled Getting to Naive Bayes’)

Conditional Probability in plain English: What is the probability that something will happen, given that something else has already happened. Read more of this post

Model Accuracy

— For Regression setting

Measuring Quality of Fit

For regression, Mean Squared Error (MSE) is the most commonly used measure. The MSE will be small if the predicted responses are very close to the true responses. MSE is usually calculated using training data and used on test data – called the test MSE. So the way to go is to evaluate test MSE, and select the learning method for which the test MSE is the least. Read more of this post


Today, your model—just like your jeans—seems to “hug” your sample data perfectly. But you want your jeans to fit a year or so down the road. Read more of this post

Supervised and Unsupervised Learning, Machine Learning

Machine Learning is a class of algorithms which is data-driven, i.e. unlike “normal” algorithms it is the data that “tells” what the “good answer” is. Example: an hypothetical non-machine learning algorithm for face recognition in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but will “learn-by-examples”: you’ll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face. Read more of this post

What is Statistical Learning

Example : If we determine that there is an association between advertising and sales, then we can adjust advertising budgets, thereby indirectly increasing sales. In other words, our goal is to develop an accurate model that can be used to predict sales on the basis of the three media budgets (TV, Newspaper, Radio)

So we try to model relationship between Y (output variable – sales) and X = (X1,X2, . . .,Xp) (predictor / input variables) , which can be written in the very general form Y = f(X) + ε where ε is the error term and f is some fixed but unknown function of X1, . . . , Xp Read more of this post

Big Data and Hadoop Basics

1. How big is big data ?

Confusion! Most agree on a definition that it is data big enough that it cant be processed on a single machine.

2. 3V ?

Volume, Variety, Velocity Read more of this post

Singular Value Decomposition (Also explains PCA)

The below is a reproduction of an answer in the Coursera discussion forum to the question that SVD was too complicated to understand and the material available on the web, directly goes into math instead of explaining what SVD and PCA really does.

Ive reproduced it here because it is too good and no part needs to be edited. Also once the course is archived this will not be available anymore.

Full credit goes to Pete Kazmier –


After reading more and going back to the lectures, I think I finally understand the practical aspect of SVD/PCA when it comes to a data analysis. Most of the material I found online was focused on “how” these tools work and the math behind them, which is of little interest to me. I’m much more interested in the use of the tools. In short, I drive a car to work everyday, but I don’t care how its engine is built, only that it gets me from point A to point B. The following is my attempt to help others move past these lectures with some understanding of the material and how it relates to data analysis.

Read more of this post

Principal Component Analysis

It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical  representation is not available, PCA is a powerful tool for analysing data. The other main advantage of PCA is that once you have found these patterns in the data, and you compress the data, ie. by reducing the number of dimensions, without much loss of information. This technique used in image compression Read more of this post