Dongxu Zhang 14-11-03

来自cslt Wiki
跳转至: 导航搜索

Accomplished this week

  • Create 100k,200k,150576 vocabulary. And use 150576 to build baiduhi, baiduzhidao language model(still running, preprocess).
  • Use 166k vocabulary to train lm on baiduhi, baiduzhidao seperately,(still running ,pruning)
  • Extract sentences which contains English and numbers from weibo corpus.
  • Running BPTT using rwthlm. Still not normal. High ppl, low wer. But it seems that using rwthlm itself, lstm is indeed better than standard bptt.
  • Found a tool called Shenlan which can parse Sogou cell vocabulary. Using its code with a crawler, we can update our vocabulary with new words.

Planned for next week

  • Working on building lm and comparing vocabulary.
  • Working on rwthlm.