Today, we are going to cover Sequence to Sequence Learning with Neural Networks, 2014.

Traditional DNNs improved the fields of machine translation, speech recognition ,etc remarkably. However, the traditonal DNNs can only handle fixed dimension of input and outputs, which is a critical flaw, especially in translating a sequence where the input and output dimensions are unknown.

Architecture

Let’s look at Seq2Seq architecture.

Screen Shot 2021-03-25 at 1.00.38 AM

It seems like one long RNN but they use two LSTM as the encoder and decoder respectively. In this example, the model takes a sentence “ABC” as an input and translate into “XYZ” as an output. The encoder reads “A”, “B”, “C” sequentially then the encoder creates a large fixed-dimensional vector. When “", meaning the end of sentence is read, the encoder can tell it is the end of the input sentence then the decoder extracts the output sentence from the fixed vector.

Why LSTM?

Compared to tradional RNNs, LSTMs can reflect the long-term dependency.

Three findings of this paper:

  • Two different LSTMs

  • Deep LSTMs significantly outperformed shallow LSTMs

  • Reversing the order of the words improved the model performance

Well, seq2seq model seems to be overwhelmed by Transformer and this post might be interesting if you are wondering the future of seq2seq model.

Reference