ML: Week 2
25 Mar 2014Multivariate linear regression
####Improve Gradiant Descent
- Feature Scaling
- Make sure features are on a similar scale
- Get every feature into approximately −1≤xi≤1 range
- Mean Normalization
- Make features have approximately zero mean(excluding x0=1)
- x1=x1−μ1s1 where:
- μ1 is avg value of x1
- s1 is range of x1 or standard deviation
-
Playing around with learning rate α
- Plot J(θ) as function of number of iterations
- if α is too small: slow convergence
- if α is too large: not decrease or even not converge (possibly slow convergence)
####Normal Equation
- Solve θ analytically
-
Doesn’t work well if n is too big(in millions)
Taking inverse of a matrix is approx on the order of cube of dimension
- Suitable for linear regression
- Use pinv in Octave
####Invertibility
Why XTX can be non-invertible
-
Redundant features(linear dependent)
x1 = size in feet2
x2 = size in m2
-
Too many features
eg: m≤n
- Delete some features
- Use regularization
Your Comments