Tech Notes

My notes on Statistics, Big Data, Cloud Computing, Cyber Security

Big Data and Hadoop Basics

1. How big is big data ?

Confusion! Most agree on a definition that it is data big enough that it cant be processed on a single machine.

2. 3V ?

Volume, Variety, Velocity

3. Core Hadoop


Some of the other softwares on top of Hadoop to make it easier to talk to Hadoop


How HDFS works

Eg . Each big file is split into 64 mb “blocks” and stored on a “node”



To prevent failure due to missing data in one of the DNs (Due to disk failure etc) Hadoop replicates the data 3 times across different DNs and the “Namenode” keeps track of this (Namenode has metadata to track which block of data is in which node). If a cluster fails and the data is under-replicated, the Namenode re-replicates it on another one.


Since Namenode is a single point of failure, its also possible to have a standby namenode.


Mapping and Reducing


Eg : Calculate Sales by city.

Mappers read data and pile them up into index cards. Reducers then collect their sets of cards and do some operation on them. (Each reducer is told which city they are responsible for )


When we run a MR job, we submit the job to a job tracker. The job tracker splits the work mappers and reducers.

Actually running the M/R jobs is done by a daemon called Task tracker. Since the MR job is run on the node itself, there is very less network traffic between nodes.


Disclaimer : These are my study notes – online – instead of on paper so that others can benefit. In the process I’ve have used some pictures / content from other original authors. All sources / original content publishers are listed below and they deserve credit for their work. No copyright violation intended.

Referencesfor these notes :

The study material for the MOOC “Introduction to Hadoop and MapReduce” at

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: