“14-10-19 Dongxu Zhang”版本间的差异

2014年10月19日 (日) 12:45的版本

Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100, i100*m100). when using 2 cpu kernels, it takes aroung 200min per epoch.
Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.
An idea occured to me which may improve word2vec with much more semantic information. But there is huge computation complexity problem that bothers me, which I wish we can discuss.

@@ 第2行： / 第2行： @@
 * Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100, i100*m100). when using 2 cpu kernels, it takes aroung 200min per epoch.
 * Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
-* An idea occured to me which may improve word2vec with much more semantic information. But there is huge computation complexity problem that bothers me, which I wish we can discuss.
 * Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.
+* An idea occured to me which may improve word2vec with much more semantic information. But there is huge computation complexity problem that bothers me, which I wish we can discuss.
 === Next week ===