Language Translation a.k.a Sequence to Sequence learning is one of the critical use cases of the NLP. Translation assists people in very many ways(clear communication & learning). Let’s take a deep dive into the understanding of how the translator actually works and what are the building blocks under the hood.
Translator comprises of two prime components:
Let’s take an example sentence in English “Every smile makes you a day younger” to be translated into Tamil “ஒவ்வொரு புன்னகையும் உங்களை ஒரு நாளை இளமையாக ஆக்குகிறது”.
Since the translation involves the conversion of an entire sentence from one language to another but still maintaining the context, the NLP model has to learn the complete sentence(not the word to word mapping). To train the model, we need to gather sufficient samples(sentences) in both languages.
Encoder for Language Translation: As an initial step, let’s unravel the encoder part and how the model extracts the crux of the sentence. We’ll take a simple RNN for demonstration,
When each word passes through the RNN, the memory state absorbs the meaning from the words as it moves along the time distributed layer. The final memory state from the last layer has the word vector that contains the whole context of the input text. We’ll ignore the output and consider only the memory state for the subsequent processing.
Decoder Network: Now we have the context vector from the encoder part, the next step is to create a decoder network that has the corresponding translated version of the input text.
The input to the decoder network is the memory state(context vector) from the encoder network. To indicate the beginning and the end of a sentence, we have the start & end tag associated with the sentence.
- In the decoder, every cell outputs a word and the output of the previous cell will be given as the input to the next cell to predict the subsequent word. This process will vary during training and testing.
- While training, we use the concept of teacher forcing where each previous output to the current layer is the actual word present in the sentence and not the predicted word. This is done for faster model convergence. On the other hand, during testing, it will be the word forecasted by the language model.
- In the decoder part, we are only concerned about the output from each cell which basically forms the desired sentence and thus discarding the final memory state.
- During backpropagation, in the decoder, the loss is computed by comparing the predicted output with the actual value for learning the parameters(weights & bias).
Kindly refer to the below link for a nice implementation of a Language translation model using a simple LSTM.