CS165B - Wang - 1

Linear classification
- $x \cdot w > t$ then +
- else -
- $x$ and $w$ are both vectors
  - however $x$ is a vector representation of a point
  - and $w$ is a vector representation of a line
    - $w$ is the orthogonal line to a classifying line of separation
  - $t$ is the distance from the origin of the classifying line
- In three dimensional space, classifier is represented by a plane
  - higher dimensions = hyperplane
Homogeneous coordinates
- Instead of $w \cdot t > t$
- decision rule becomes
  - $x\degree \cdot w \degree > 0$
  - $x\degree = [x,1] = [x_1,x_2, 1]$
  - $w\degree = [w,-t] = [w_1,w_2,-t]$
- Homogeneous coords embed an n dimensional task in an n+1 dimensional representation
Generalization
- Goal is to model the true regularities in the data and to ignore the noise in the data
- Reducing model complexity
  - Ockham’s Razor
  - Prefer the simplest hypothesis that is consistent with the data
Problems with dimensionality
- Machine learning often involves very high-dimensional data
  - Generally the required amount of training data and computation al resources increases exponentially with dimensionality
Distance measures
- How similar are two datapoints?
- General assumption in ML: Similarity is a function of distance
  - However there are different ways to measure distance
- Distance measures
  - Compute N features, resulting in a feature vector of N elements
  - The feature vector is then the only information the system knows about the data sample
  - Define a distance measure between two feature vectors
- Euclidian or $L^2$ distance
  - find the norm of the difference of the two feature vectors
- Other distance methods
  - $L_1$ distance
    - take the sum of the absolute differences of the vectors
  - $L_p$ distance
    - $d(x,y) = (\sum^d_{i=1}(x_i- y_i)^p)^{\frac1p}$