140428-Xiaoxi Wang
来自cslt Wiki
This week:
preprocessed the baiduzhidao and part of weibo data. wrote a Hanzi2Num tool sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them classified corpora according to keywords.
Next week: Train and evaluate lm from classified corpora make improves on algorithms