READING NOTES: FastText (Joulin

less than 1 minute read

Reading notes of Bag of Tricks for Efficient Text Classification.

Main Idea

State-of-art accuarcy, large scale, fast text classification using linear models with

  • rank constraint
  • fast loss approximation

Linear model with Rank Constraint

Word representations (bag of ngram) –averaged–> text representation ==> classification (softmax , i.e. NECLoss).

  • Complexity: \(O(\)num_class * hidden_dim\()\)
  • Speeding up: Hierachical Softmax (based on Huffman coding tree)
    • Complexity \(O(\)hidden_dim * \(log_2\)(num_class)\()\)
    • T top targets in \(O(log(T))time\).

Experiments and Results

  • Sentiment analysis
    • hidden_dim=10: bettern than char-CNN and char-CRNN, worse than VDCNN
    • much much faster
  • Tag prediction
    • higher accuray with bigrams
    • much much faster again


The paper proposes simple but effective baseline method using averaged word representation. The method has on par performance with DL methods on the two classification tasks and is fast and easy to scale.

Seems like when to use LSTM, RNN for text classification would be a good question to ask.

Leave a Comment