In the past, we’ve seen different types of neural network architectures, starting with simple fully connected neural networks to CNNs for image processing(computer vision). Another category in that list is Recurrent Neural Networks(RNN) and the main target of the RNNs are sequential data. Sequence refers to the current data having a dependency on the past. Some of the examples include time series(sales prediction), speech, text and video sequences where the current information is a result of the accumulation of the previous details.
For sequential data, the normal neural networks might not be very effective. As these networks can only process the immediate previous layer information and pass it on to the subsequent layer. The core theme of the RNN is to remember past information in the form of a memory state. We’ll explore the memory state further.
The sequential data is further sub-divided into four categories depending upon how the input and output are related to each other.
- One to One — Pos tagging. For each word in the input, we have the corresponding output(parts of speech).
- Many to One — Review analysis(sentiment). Given a sentence(many words), we try to predict whether the sentiment of the customer is positive or negative.
- One to many — Given an image, we would like to create a caption(many words) for it.
- Many to Many — Translation from one language to another (i.e) from one sentence to another.
From the above picture, the left side picture refers to the recurrence nature of the RNN. When we unravel it, the previous state information(V) gets passed on to the subsequent stages. Each input(word) denoted as X and the previous memory state passed onto the tanh function to produce an output O.
Depending on the use case, we can either retain output from each cell/unit or consider only the final outcome. The memory state is indicated by V which gets transferred from one cell to another. Every time when the memory state propagates through the layers, it adds up the additional details from every layer. So the final memory state will be a composite of all of the previous sequential values.
Similar to the forward propagation, the backpropagation also travels through time (i.e) the previous layer derivative depends on the gradient from the future state.
we’ll implement a simple RNN and see the working principle in action in the next article.