READING NOTES: Practitioners’ Guide to CNN on sentence classification (Zhang and Wallac)

1 minute read

Reading notes of A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural

Main Idea:

How sentence classification performance of one-layer CNNs was effected on 9 datasets regarding to the following architecture components:

input word vector representations
filter region size
number of feature maps
activation functions
pooling strategy
regularization

CNN Architecture

use slide windows on sentence matrix: sentence matrix constitute of word vectors -> three filter sizes each with 2 filters = 6 feature maps ->
max-pooling: max-pooling on each feature map ->
concatenation: feature vector = concatenation of the 6 max features ->
softmax: softmax layer for classification

Experiments and Results

9 datasets on sentence classification.(7 same in Kim (2014))
10-fold cross validation on all datasets
Effects of factor:
- input word vectors:
  - one-hot performs worse than embeddings on sentence classification (was originally used for doc classification);
  - embeddings trained from scratch may be best.
- filter sizes
  - coarse grid search could work
  - multiple different filters with size (close to optimal) may help
  - number of each filter (depend on datasets, 100-600, note might overfitting if too large)
- activation function
  - Best: ReLU, tahn, Iden (no activation)
  - if multiple hidden layers, consider ReLU and tahn
- pooling
  - global max pooling better than local, k-max pooling, and average pooling
- regularization: not helping much
  - small dropout (0-0.5): depends on datasets
  - large \(l2\) norm
  - increse number of feature maps to see if help

Discussion

Well, hyperparameter tuning is always time-consuming and sometimes even frustrating. Though it’s task specific, the baseline configuration and suggestions included in the paper could be useful for practitioners on this task:

input word vectors: Glove
filter region size: (3,4,5)
feature maps: 100
activation function: ReLU
pooling: 1-max pooling
dropout rate: 0.5
l2 norm constraint: 3

Share on

Twitter Facebook Google+ LinkedIn

Wenyan Li

READING NOTES: Practitioners’ Guide to CNN on sentence classification (Zhang and Wallac)

Main Idea:

CNN Architecture

Experiments and Results

Discussion

Share on

Leave a Comment

You May Also Enjoy

READING NOTES: Taskonomy (Zamir et al., 2018)

READING NOTES: Sentence embeddings for linguistic properties (Conneau et al., 2018)

READING NOTES: Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents (Oda et.al)

READING NOTES: A Critical Review of Recurrent Neural Networks for Sequence Learning (Lipton and Berkowitz)