“Document classification test”版本间的差异
来自cslt Wiki
(→Problem An Solve) |
|||
第1行: | 第1行: | ||
==Problem An Solve== | ==Problem An Solve== | ||
+ | [[How to import the sparse data of vsm to weka]] | ||
+ | |||
==Document classification of Sougou data == | ==Document classification of Sougou data == | ||
* DATA | * DATA |
2014年9月9日 (二) 01:59的版本
目录
Problem An Solve
How to import the sparse data of vsm to weka
Document classification of Sougou data
- DATA
- Data from SougouLab [1],using SogouC.reduced(30M)
- 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
- train and test: train(),test(),dev()
- Text preprocessing
- Segment word using wordlist of 9W.(tencent)
- Remove stop word.stop_wordlist is
- Some Tools
- weka
- scw
- google word2ve
- LDA
VSM Test
- Data
- dimension:9402
- Method
- document reprenstion: use the tf-idf weight for word weight
- classifier: Native Bayes
- Result