• Why use new methods of metric measuring?
    • Can be used to improve scores on certain models by measuring success through a different metric?
    • Previous methods
      • Rule-based
      • Supervised Metric
    • Are all errors created equal?
      • Different severity levels of errors based on what kind of error they are
        • e.g. “cat person” or “people person”
      • Error hierarchy
        • MQM Human Annotations
          • give score based upon the severity of errors in a sentence and sum them up to get a final score