“Document classification test”版本间的差异

2014年9月28日 (日) 11:33的最后版本

@@ 第1行： / 第1行： @@
-==Problem An Solve==
+==Problem And Solve==
-==Document classification of Sougou data ==
+*[[How to import the sparse data of vsm to weka]]
-* DATA
-:* Data from SougouLab [http://www.sogou.com/labs/dl/c.html],using SogouC.reduced(30M)
-:* 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
-:* train and test: train(),test(),dev()
-*Text preprocessing
-:* Segment word using wordlist of 9W.(tencent)
-:* Remove stop word.stop_wordlist is
-:*
-*Some Tools
-:* weka
-:* scw
-:* google word2ve
-:* LDA
-===VSM Test===
-*Data
-:* dimension:9402
-*Method
-:* document reprenstion: use the tf-idf weight for word weight
-:* classifier: Native Bayes
-*Result
-===LDA Test===
+==Test==
-===Word2vec Test===
+[[Sougou data]]