Take, for example, you want to predict the word in sentence The ( ) jumped over a pond. Ground truth is frog. In this sentence, it is hard to predict the blank with prior information only : The. In this case, it is better to use the information behind the blank : jumped over a pond. Likewise, it is important to learn backward (from future to present) as well as forward (past to present).

Structure

2021-03-15-1

(orginally from Woosung Choi’s blog)

The structure is very simple. The difference is that hidden layer is a concatenation of forward and backward hidden units. If you change the hidden units with LSTM, it becomes biLSTM. So simple!

Code

(to be updated)