14-10-19 Dongxu Zhang
来自cslt Wiki
Accomplished this week
- Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100, i100*m100). when using 2 cpu kernels, it takes aroung 200min per epoch.
- Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
- Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.
- An idea occured to me which may improve word2vec with much more semantic information added. But there is huge computation complexity problem that bothers me, which I wish we can discuss.
Next week
- Test LSTM-Rnn LM.
- Finished building lexion.
- Understand the paper.
- May have time to achieve my baseline idea on text8.