140603 Xiaoxi Wang
来自cslt Wiki
Last week:
Improved corpora proprecessing tools (http stripper, num2hanzi), and reprocessed weibo corpora
learned cross-entropy difference based domain specific corpora extraction method.
recorded voice of numbers for testing
This week:
Train new lm with new corpora (weibo)
Compare new in-domain corpora selection method and old topic spotting based method