Methods of Metric Measuring

Why use new methods of metric measuring?
- Can be used to improve scores on certain models by measuring success through a different metric?
- Previous methods
  - Rule-based
  - Supervised Metric
- Are all errors created equal?
  - Different severity levels of errors based on what kind of error they are
    - e.g. “cat person” or “people person”
  - Error hierarchy
    - MQM Human Annotations
      - give score based upon the severity of errors in a sentence and sum them up to get a final score