“14-10-19 Dongxu Zhang”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Next week
Accomplished this week
 
(相同用户的6个中间修订版本未显示)
第1行: 第1行:
 
=== Accomplished this week ===
 
=== Accomplished this week ===
* Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100). when using 2 kernels, it takes aroung 200min per epoch.
+
* Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100, i100*m100). when using 2 cpu kernels, it takes aroung 200min per iteration.  
 
* Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
 
* Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
* An idea occured to me which may improve word2vec with much more semantic information. But there is huge computation complexity problem that bothers me, which I wish we can discuss.
 
 
* Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.
 
* Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.
 +
* An idea occured to me which may improve the performance of word2vec with much more semantic information added. But there is huge computation complexity problem that bothers me, which I wish we can discuss about.
  
 
=== Next week ===
 
=== Next week ===
第10行: 第10行:
 
* Understand the paper.
 
* Understand the paper.
 
* May have time to achieve my baseline idea on text8.
 
* May have time to achieve my baseline idea on text8.
 +
 +
 +
=== Myself ===
 +
* Prepare for toefl test.
 +
* prepare for master's thesis.

2014年10月20日 (一) 00:41的最后版本

Accomplished this week

  • Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100, i100*m100). when using 2 cpu kernels, it takes aroung 200min per iteration.
  • Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
  • Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.
  • An idea occured to me which may improve the performance of word2vec with much more semantic information added. But there is huge computation complexity problem that bothers me, which I wish we can discuss about.

Next week

  • Test LSTM-Rnn LM.
  • Finished building lexion.
  • Understand the paper.
  • May have time to achieve my baseline idea on text8.


Myself

  • Prepare for toefl test.
  • prepare for master's thesis.