• Framework
    • Pre-training
      • trained on unlabeled dadta over different pre-training tasks
    • Fine-tuning
      • fine-tuned parameters using labeled data from the downstream tasks
        • for example for the NLI dataset, classification task would require supervised fine-tuning on the pre-trained BERT model