“140428-Xiaoxi Wang”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“This week: preprocessed the baiduzhidao and part of weibo data. wrote a Hanzi2Num tool sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from ...”创建新页面)
 
Wxx讨论 | 贡献
 
第2行: 第2行:
  
 
preprocessed the baiduzhidao and part of weibo data.
 
preprocessed the baiduzhidao and part of weibo data.
 +
 
wrote a Hanzi2Num tool
 
wrote a Hanzi2Num tool
 +
 
sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them
 
sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them
 +
 
classified corpora according to keywords.
 
classified corpora according to keywords.
 +
  
 
Next week:
 
Next week:
 +
 
Train and evaluate lm from classified corpora
 
Train and evaluate lm from classified corpora
 +
 
make improves on algorithms
 
make improves on algorithms

2014年4月28日 (一) 09:56的最后版本

This week:

preprocessed the baiduzhidao and part of weibo data.

wrote a Hanzi2Num tool

sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them

classified corpora according to keywords.


Next week:

Train and evaluate lm from classified corpora

make improves on algorithms