“140428-Xiaoxi Wang”版本间的差异
来自cslt Wiki
(以内容“This week: preprocessed the baiduzhidao and part of weibo data. wrote a Hanzi2Num tool sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from ...”创建新页面) |
|||
第2行: | 第2行: | ||
preprocessed the baiduzhidao and part of weibo data. | preprocessed the baiduzhidao and part of weibo data. | ||
+ | |||
wrote a Hanzi2Num tool | wrote a Hanzi2Num tool | ||
+ | |||
sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them | sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them | ||
+ | |||
classified corpora according to keywords. | classified corpora according to keywords. | ||
+ | |||
Next week: | Next week: | ||
+ | |||
Train and evaluate lm from classified corpora | Train and evaluate lm from classified corpora | ||
+ | |||
make improves on algorithms | make improves on algorithms |
2014年4月28日 (一) 09:56的最后版本
This week:
preprocessed the baiduzhidao and part of weibo data.
wrote a Hanzi2Num tool
sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them
classified corpora according to keywords.
Next week:
Train and evaluate lm from classified corpora
make improves on algorithms