• Abstract
    • This paper is a comprehensive evaluation of the current state of the art dialogue system techniques and datasets
  • Introduction
    • Dialogue systems perform chit-chat with human or serve as an assistant via conversations
    • There are two main types of dialogue system applications
      • Task oriented dialogue systems (TOD)
        • Solve specific problems in a certain domain such as movie ticket booking, restaurant table reserving, etc.
      • Open domain dialogue systems (OOD)
        • Aims to chat with users without task and domain restrictions
        • They are usually data-drive
    • Both dialogue systems can be seen as a mapping $\phi$ from user message $U = \{u^{(1)}, u^{(2)}, \ldots,u^{(i)}\}$ to agent response $R = \{r^{(1)}, r^{(2)}, \ldots,r^{(i)}\}$ where $R = \phi(U)$
    • In many ope-domain and task-oriented dialogue systems, this mapping also considers a source of external knowledge/database $K$ as input
      • such that $R = \phi(U,K)$
  • Neural Models in Dialogue Systems
    • Summary
      • Introduce neural models thata re popular in state of the art dialogue systems and related subtasks
    • Convolutional Neural Networks
      • Considered one of the most powerful models
      • ‘Deep’
        • refers to the fact that they are multilayer which extracts features by stacking feed-forward layers
        • Feed forward layers can be defined as
          • $y = \sigma(Wx + b)$
          • where $\sigma$ is an activation function which makes the otherwise linear operation, non-linear
          • $W$ and $b$ are trainable parameters
  • Task Oriented Dialogue Systems
    • Overview
      • There are two main types of systems in TOD systems
        • Modular
          • Natural Language Understanding (NLU)
          • Dialogue State Tracking (DST)
          • Dialogue Policy Learning
          • Natural Language Generation
        • End to End
    • NLU
      • The NLU module impacts the whole system significantly in terms of the response quality
      • Converts natural language message produced by the user into semantic slots and performs classification
      • manages three tasks
        • Domain classification and intent detection
          • Both of these are classification problems
          • They use classifiers to predict a mapping from the input language sequence to a predefined label set
            • For example, the predicted domain can be “movie” and the intent is “find_movie”
        • Slot filling