14-10-19 Dongxu Zhang

来自cslt Wiki
2014年10月19日 (日) 12:43Zhangdx讨论 | 贡献的版本

跳转至: 导航搜索

Accomplished this week

  • Train LSTM-Rnn LM with 200MB corpus(vocabulary 10k, classes 100). when using 2 kernels, it takes aroung 200min per epoch.
  • Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
  • An idea occured to me which may improve word2vec with much more semantic information. But there is huge computation complexity problem that bothers me, which I wish we can discuss.
  • Read paper "Learning Long-Term Dependencies with Gradient Descent is Difficult". Still in progress.

Next week

  • Test LSTM-Rnn LM.
  • Finished building lexion.
  • Understand the paper.
  • May have time to achieve my baseline idea on text8.