Intro to Hadoop and MapReduce
19 Jan 2014Notes I took during taking the Udacity course Introduction to Hadoop and MapReduce
##Lesson 1: Big Data
###3Vs of big data:
- Volume
- Variety
- Velocity
Notes I took during taking the Udacity course Introduction to Hadoop and MapReduce
##Lesson 1: Big Data
###3Vs of big data:
###Proof of induction
Try sometimes induction on portion of the whole problem.
Induction can be only a component of the whole proof, try simplify the problem then use induction as the last step to close the loop.
###Proof of DP
####3M: Meaning, Method, Meat
####Pattern(Must say in prooving DP)
###Competitive Analysis
OPT is not possible (eg. you cannot predict the future), but you want to be as close to OPT as possible. How do you benchmark it?- Competitve Analysis!
Try to performance within some constant range of OPT no matter how bad things were. (Try to perform as good as wizard no matter how evil the devil is.)
###Online Algorithm
You have to make decisions on the fly. And you have no idea what’s coming next.
###Algorithm
Ski Rental
Buy(500) or rent(50)?
The first 9 visits, rent; after 10th, buy.
Multiplicative weights update
Wrong experts get punished.
Try to be within a factor of the best single expert.
LRU
The least recently used gets thrown out when the cache is full.
###Components
Latency Numbers Every Programmer Should Know
###Example
####Binary search vs. Linear search
#####Binary search:
if a[i] < key:
go left
else:
go right
The if statement is likely to fail
Runtime:
is the cost of incorrect prediction
#####Linear search
for i = 1 : n
if a[i] = key:
return i;
The if statement will only fail 1 time when it actually find the element. (Assumption: if branch prediction act like we want)
Runtime:
###Features:
The time complexity for each part varies by different problems.
###Algorithms:
K-selection problem(similar to quicksort)
De-randomize: divide into group of 5, pivot is median of medians
FFT
Group even/odd columns together and reduce the computation.
K-th smallest in 2 sorted lists
Binary search on 2 lists, knock out 1/4 each time.
Find the closest pair in a plane