“Document classification test”版本间的差异

2014年9月28日 (日) 11:33的最后版本

@@ 第2行： / 第2行： @@
 *[[How to import the sparse data of vsm to weka]]
-==Document classification of Sougou data ==
+==Test==
-* DATA
+[[Sougou data]]
-:* Data from SougouLab [http://www.sogou.com/labs/dl/c.html],using SogouC.reduced(30M)
-:* 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
-:* train and test: train(),test(),dev()
-*Text preprocessing
-:* Segment word using wordlist of 9W.(tencent)
-:* Remove stop word.stop_wordlist is
-:*
-*Some Tools
-:* weka
-:* scw
-:* google word2ve
-:* LDA
-===VSM Test===
-*Data
-:* dimension:9402
-*Method
-:* document reprenstion: use the tf-idf weight for word weight
-:* classifier: Native Bayes
-*Result
-{| border="2px"
-|+ classification result
-|-
-! Training Set !! 财经!! IT!! 健康!! 体育!! 旅游 !!教育 !! 招聘!! 文化!!军事
-|-
-! TFIDF
-| 0.678 || 0.718 || 0.708 || 0.708 || 0.73
-|-
-|}
-===LDA Test===
-===Word2vec Test===