Hashing and Data Structures

###Problem 1

  1. Universal hash function: at most of parameters cause collision
  2. Universal hash functions can have different “effectiveness”
    • 1 parameter: bits
    • 2 parameter: bits
  3. Linear hash function is easy to break

###Problem 2

  • Trade space for time
  • Ensuring some invariant, thus the correctness of the algorithm

###Problem 3

  • Use data structure as a component for online algorithm
  • Use additional infomation to determine if it is possible that answer is inthe subtree

###Problem 4

  • Any memory allocation scheme is bad, because it cannot predict the future
  • We can only decide to use some scheme based on some assumption or observation on the user
  • We can try to be -competitive

Greedy Algorithms

#####Entropy is the optimal(best we can do) of infomation compression

####About greedy algorithm

  • You don’t have to show that your greedy algorithm beats OPT, you just need to show that it’s as good as the OPT
  • Proving greedy algorithm:
    • Induction
    • Extreme

###Problem 1

To disprove it, just try find counterexamples

###Problem 2

Greedy algorithm is not always the solution, if it’s the case, try some other algorithms

###Problem 3

When swap into OPT, make sure that it’s at least as good as OPT

###Problem 4

  • Shannon-Fano is not good, you need to modify it to make it actually work
  • If encoding individually is not good, try encoding grouply
  • Proof technique: use contradiction when it’s not easy to quantitize the constraints
  • Lempel-Ziv is a good compression algorithm

###Problem 5

  • Huffman encoding is guaranteed to be within range of
  • For infomation with low entropy, use Huffman combined with run-length

ML: Week 2

Multivariate linear regression

####Improve Gradiant Descent

  • Feature Scaling
    1. Make sure features are on a similar scale
    2. Get every feature into approximately range
  • Mean Normalization
    1. Make features have approximately zero mean(excluding )
    2. where:
      • is avg value of
      • is range of or standard deviation
  • Playing around with learning rate

    1. Plot as function of number of iterations
    2. if is too small: slow convergence
    3. if is too large: not decrease or even not converge (possibly slow convergence)

####Normal Equation

  • Solve analytically
  • Doesn’t work well if n is too big(in millions)

    Taking inverse of a matrix is approx on the order of cube of dimension

  • Suitable for linear regression
  • Use pinv in Octave

####Invertibility

Why can be non-invertible

  • Redundant features(linear dependent)

    = size in

    = size in

  • Too many features

    eg:

    • Delete some features
    • Use regularization

DoCP: Lesson 1

#####Notes - Test driven - Extreme value - lexigraphic ordering - use the comparator function, not the item itself - computing v.s. doing - computing: return result, pure functions - doing: doesn’t return result, inpure functions or subroutines

    It's easy to test pure functions

#####Python features:

  • Same function can have different return type(dynamic typing system)
  • The use of ‘–234’.index(r)
  • The use of list.count
  • list: reversed

#####Refactoring

  • DRY: don’t repeat yourself

#####Lesson Learned

  • understand
  • define pieces
  • reuse
  • test
  • explore the design space
    • correctness
    • efficiency
    • elegance
    • features

Note: Case Study

###Leader Election in Ring

####Take away

  1. Using libraries

    It’s standard notions so you already understand it. And it reduces the risk of mistakes.

####Question

  1. How to modify the model to allow message to be dropped?
  2. Each step only one process send id?(concurrency)

    Dealt with in fact Traces

  3. Step may send itself out again.
  4. The discussion for section 4

###Memory

####Take away

  1. Abstraction function

####Question

  1. Different address can hold the same data?
  2. Alpha function v.s. alpha predicate