“Dongxu Zhang 14-11-03”版本间的差异
来自cslt Wiki
(以“=== Accomplished this week === * Create 100k,200k,150576 vocabulary. And use 150576 to build baiduhi, baiduzhidao language model(still running, preprocess). * Use 1...”为内容创建页面) |
(没有差异)
|
2014年11月2日 (日) 16:52的最后版本
Accomplished this week
- Create 100k,200k,150576 vocabulary. And use 150576 to build baiduhi, baiduzhidao language model(still running, preprocess).
- Use 166k vocabulary to train lm on baiduhi, baiduzhidao seperately,(still running ,pruning)
- Extract sentences which contains English and numbers from weibo corpus.
- Running BPTT using rwthlm. Still not normal. High ppl, low wer. But it seems that using rwthlm itself, lstm is indeed better than standard bptt.
- Found a tool called Shenlan which can parse Sogou cell vocabulary. Using its code with a crawler, we can update our vocabulary with new words.
Planned for next week
- Working on building lm and comparing vocabulary.
- Working on rwthlm.