140428-Xiaoxi Wang
来自cslt Wiki
This week:
preprocessed the baiduzhidao and part of weibo data.
wrote a Hanzi2Num tool
sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them
classified corpora according to keywords.
Next week:
Train and evaluate lm from classified corpora
make improves on algorithms