• Linear classification
    • $x \cdot w > t$ then +
    • else -
    • $x$ and $w$ are both vectors
      • however $x$ is a vector representation of a point
      • and $w$ is a vector representation of a line
        • $w$ is the orthogonal line to a classifying line of separation
      • $t$ is the distance from the origin of the classifying line
    • In three dimensional space, classifier is represented by a plane
      • higher dimensions = hyperplane
  • Homogeneous coordinates
    • Instead of $w \cdot t > t$
    • decision rule becomes
      • $x\degree \cdot w \degree > 0$
      • $x\degree = [x,1] = [x_1,x_2, 1]$
      • $w\degree = [w,-t] = [w_1,w_2,-t]$
    • Homogeneous coords embed an n dimensional task in an n+1 dimensional representation
  • Generalization
    • Goal is to model the true regularities in the data and to ignore the noise in the data
    • Reducing model complexity
      • Ockham’s Razor
      • Prefer the simplest hypothesis that is consistent with the data
  • Problems with dimensionality
    • Machine learning often involves very high-dimensional data
      • Generally the required amount of training data and computation al resources increases exponentially with dimensionality
  • Distance measures
    • How similar are two datapoints?
    • General assumption in ML: Similarity is a function of distance
      • However there are different ways to measure distance
    • Distance measures
      • Compute N features, resulting in a feature vector of N elements
      • The feature vector is then the only information the system knows about the data sample
      • Define a distance measure between two feature vectors
    • Euclidian or $L^2$ distance
      • find the norm of the difference of the two feature vectors
    • Other distance methods
      • $L_1$ distance
        • take the sum of the absolute differences of the vectors
      • $L_p$ distance
        • $d(x,y) = (\sum^d_{i=1}(x_i- y_i)^p)^{\frac1p}$