- Linear classification
- $x \cdot w > t$ then +
- else -
- $x$ and $w$ are both vectors
- however $x$ is a vector representation of a point
- and $w$ is a vector representation of a line
- $w$ is the orthogonal line to a classifying line of separation
- $t$ is the distance from the origin of the classifying line
- In three dimensional space, classifier is represented by a plane
- higher dimensions = hyperplane
- Homogeneous coordinates
- Instead of $w \cdot t > t$
- decision rule becomes
- $x\degree \cdot w \degree > 0$
- $x\degree = [x,1] = [x_1,x_2, 1]$
- $w\degree = [w,-t] = [w_1,w_2,-t]$
- Homogeneous coords embed an n dimensional task in an n+1 dimensional representation
- Generalization
- Goal is to model the true regularities in the data and to ignore the noise in the data
- Reducing model complexity
- Ockham’s Razor
- Prefer the simplest hypothesis that is consistent with the data
- Problems with dimensionality
- Machine learning often involves very high-dimensional data
- Generally the required amount of training data and computation al resources increases exponentially with dimensionality
- Distance measures
- How similar are two datapoints?
- General assumption in ML: Similarity is a function of distance
- However there are different ways to measure distance
- Distance measures
- Compute N features, resulting in a feature vector of N elements
- The feature vector is then the only information the system knows about the data sample
- Define a distance measure between two feature vectors
- Euclidian or $L^2$ distance
- find the norm of the difference of the two feature vectors
- Other distance methods
- $L_1$ distance
- take the sum of the absolute differences of the vectors
- $L_p$ distance
- $d(x,y) = (\sum^d_{i=1}(x_i- y_i)^p)^{\frac1p}$