Summary: There are some interesting use cases where combining CNNs and RNN/LSTMs seems to make sense and a number of researchers pursuing this. However, the latest trends in CNNs may make this obsolete.
There are things that just don’t seem to go together. Take oil and water for instance. Both valuable, but try putting them together?
That was my reaction when I first came across the idea of combining CNNs (convolutional neural nets) and RNNs (recurrent neural nets). After all they’re optimized for completely different problem types.
- CNNs are good with hierarchical or spatial data and extracting unlabeled features. Those could be images or written characters. CNNs take fixed size inputs and generate fixed size outputs.
- RNNs are good at temporal or otherwise sequential data. Could be letters or words in a body of text, stock market data, or speech recognition. RNNs can input and output arbitrary lengths of data. LSTMs are a variant of RNNs that allow for controlling how much of prior training data should be remembered, or more appropriately forgotten.
We all know to reach for the appropriate tool based on these very unique problem types.
So are there problem types that need the capability of both these tools?
As it turns out, yes. Most of these are readily identified as images that occur in a temporal sequence, in other words video. But there are some other clever applications not directly related to video that may spark your imagination. We’ll describe several of those below.
Read full article by Bill Vorhies, here.