“Document classification test”版本间的差异
来自cslt Wiki
(→Document classification of Sougou data) |
(→VSM Test) |
||
第9行: | 第9行: | ||
:* | :* | ||
===VSM Test=== | ===VSM Test=== | ||
+ | *Data | ||
+ | :* dimension:9402 | ||
+ | *Method | ||
+ | :* document reprenstion: use the tf-idf weight for word weight | ||
+ | :* classifier: Native Bayes | ||
+ | *Result | ||
+ | |||
===LDA Test=== | ===LDA Test=== | ||
===Word2vec Test=== | ===Word2vec Test=== |
2014年9月7日 (日) 13:41的版本
Document classification of Sougou data
- DATA
- Data from SougouLab [1],using SogouC.reduced(30M)
- 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
- train and test: train(),test(),dev()
- Text preprocessing
- Segment word using wordlist of 9W.(tencent)
- Remove stop word.stop_wordlist is
VSM Test
- Data
- dimension:9402
- Method
- document reprenstion: use the tf-idf weight for word weight
- classifier: Native Bayes
- Result