Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey

Abstract
- This paper is a comprehensive evaluation of the current state of the art dialogue system techniques and datasets
Introduction
- Dialogue systems perform chit-chat with human or serve as an assistant via conversations
- There are two main types of dialogue system applications
  - Task oriented dialogue systems (TOD)
    - Solve specific problems in a certain domain such as movie ticket booking, restaurant table reserving, etc.
  - Open domain dialogue systems (OOD)
    - Aims to chat with users without task and domain restrictions
    - They are usually data-drive
- Both dialogue systems can be seen as a mapping $\phi$ from user message $U = \{u^{(1)}, u^{(2)}, \ldots,u^{(i)}\}$ to agent response $R = \{r^{(1)}, r^{(2)}, \ldots,r^{(i)}\}$ where $R = \phi(U)$
- In many ope-domain and task-oriented dialogue systems, this mapping also considers a source of external knowledge/database $K$ as input
  - such that $R = \phi(U,K)$
Neural Models in Dialogue Systems
- Summary
  - Introduce neural models thata re popular in state of the art dialogue systems and related subtasks
- Convolutional Neural Networks
  - Considered one of the most powerful models
  - ‘Deep’
    - refers to the fact that they are multilayer which extracts features by stacking feed-forward layers
    - Feed forward layers can be defined as
      - $y = \sigma(Wx + b)$
      - where $\sigma$ is an activation function which makes the otherwise linear operation, non-linear
      - $W$ and $b$ are trainable parameters
Task Oriented Dialogue Systems
- Overview
  - There are two main types of systems in TOD systems
    - Modular
      - Natural Language Understanding (NLU)
      - Dialogue State Tracking (DST)
      - Dialogue Policy Learning
      - Natural Language Generation
    - End to End
- NLU
  - The NLU module impacts the whole system significantly in terms of the response quality
  - Converts natural language message produced by the user into semantic slots and performs classification
  - manages three tasks
    - Domain classification and intent detection
      - Both of these are classification problems
      - They use classifiers to predict a mapping from the input language sequence to a predefined label set
        
        For example, the predicted domain can be “movie” and the intent is “find_movie”
    - Slot filling