ML: Week 2

25 Mar 2014

Multivariate linear regression

####Improve Gradiant Descent

Feature Scaling
1. Make sure features are on a similar scale
2. Get every feature into approximately $-1\leq x_i \leq 1$ range
Mean Normalization
1. Make features have approximately zero mean(excluding $x_0 = 1$ )
2. where:
  - $\mu_1$ is avg value of $x_1$
  - $s_1$ is range of $x_1$ or standard deviation
Playing around with learning rate $\alpha$
1. Plot $J(\theta)$ as function of number of iterations
2. if $\alpha$ is too small: slow convergence
3. if $\alpha$ is too large: not decrease or even not converge (possibly slow convergence)

####Normal Equation

Solve $\theta$ analytically
Doesn’t work well if n is too big(in millions)

Taking inverse of a matrix is approx on the order of cube of dimension
Suitable for linear regression
Use pinv in Octave

####Invertibility

Why $X^TX$ can be non-invertible

Redundant features(linear dependent)

$x_1$ = size in $feet^2$

$x_2$ = size in $m^2$
Too many features

eg: $m\leq n$
- Delete some features
- Use regularization

Luqi Pan