Point to point comparision between Simple word embedding based models(pooling strategies) and RNN/CNNs on 17 datasets and three tasks:
- long document classification
- text sequence matching
- short text tasks (classification & matching)
- Pooling strategies:
- average pooling: Average of word embeddings for input sequence.
- max pooling on fix dimension of word embedding matrix –> most salient features
- hierarchical pooling –> abstract and keep spatial info
- average pooling in window of length \(n\) =>
- max-pooling on features from all windows
- concat features extracted from first two
- Parameters and speed: faster than CNN/RNN by a factor of \(nd\) or \(d\). (ok now seems kramdown works fine)
- Embeddings: Use GloVe embeddings or refined embeddings with MLP
- Method: SWEM-based v.s. CNN/RNN/LSTM
- document categorization: topic categorization; sentiment analysis, ontology classification.
- sentence matching
- sequence tagging(CoNLL2000, CoNLLL2003 NER)
SWEM-based methods with poolings strategies, especially hierarchical pooling, has comparable/better performance and is more computationally efficient. Not as good on classifying short sentences.
Discussion on word-order features
The paper has an interesting discussion on the importance of word-order features, and perform an experiment by shuffling words in training, and keeping word order in test set => LSTM has comparable accuracies on Yahoo and SNLI to unshuffeled training data!
Sentiment analysis is more sensitive to word-order features than other tasks.