- Abstract
- This paper is a comprehensive evaluation of the current state of the art dialogue system techniques and datasets
- Introduction
- Dialogue systems perform chit-chat with human or serve as an assistant via conversations
- There are two main types of dialogue system applications
- Task oriented dialogue systems (TOD)
- Solve specific problems in a certain domain such as movie ticket booking, restaurant table reserving, etc.
- Open domain dialogue systems (OOD)
- Aims to chat with users without task and domain restrictions
- They are usually data-drive
- Both dialogue systems can be seen as a mapping $\phi$ from user message $U = \{u^{(1)}, u^{(2)}, \ldots,u^{(i)}\}$ to agent response $R = \{r^{(1)}, r^{(2)}, \ldots,r^{(i)}\}$ where $R = \phi(U)$
- In many ope-domain and task-oriented dialogue systems, this mapping also considers a source of external knowledge/database $K$ as input
- such that $R = \phi(U,K)$
- Neural Models in Dialogue Systems
- Summary
- Introduce neural models thata re popular in state of the art dialogue systems and related subtasks
- Convolutional Neural Networks
- Considered one of the most powerful models
- ‘Deep’
- refers to the fact that they are multilayer which extracts features by stacking feed-forward layers
- Feed forward layers can be defined as
- $y = \sigma(Wx + b)$
- where $\sigma$ is an activation function which makes the otherwise linear operation, non-linear
- $W$ and $b$ are trainable parameters
- Task Oriented Dialogue Systems
- Overview
- There are two main types of systems in TOD systems
- Modular
- Natural Language Understanding (NLU)
- Dialogue State Tracking (DST)
- Dialogue Policy Learning
- Natural Language Generation
- End to End
- NLU
- The NLU module impacts the whole system significantly in terms of the response quality
- Converts natural language message produced by the user into semantic slots and performs classification
- manages three tasks
- Domain classification and intent detection
- Both of these are classification problems
- They use classifiers to predict a mapping from the input language sequence to a predefined label set
- For example, the predicted domain can be “movie” and the intent is “find_movie”
- Slot filling