Dongxu Zhang 14-11-03

来自cslt Wiki
2014年11月2日 (日) 16:52Zhangdx讨论 | 贡献的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

Accomplished this week

  • Create 100k,200k,150576 vocabulary. And use 150576 to build baiduhi, baiduzhidao language model(still running, preprocess).
  • Use 166k vocabulary to train lm on baiduhi, baiduzhidao seperately,(still running ,pruning)
  • Extract sentences which contains English and numbers from weibo corpus.
  • Running BPTT using rwthlm. Still not normal. High ppl, low wer. But it seems that using rwthlm itself, lstm is indeed better than standard bptt.
  • Found a tool called Shenlan which can parse Sogou cell vocabulary. Using its code with a crawler, we can update our vocabulary with new words.

Planned for next week

  • Working on building lm and comparing vocabulary.
  • Working on rwthlm.