“Document classification test”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
VSM Test
Lr讨论 | 贡献
Document classification of Sougou data
 
(相同用户的21个中间修订版本未显示)
第2行: 第2行:
 
*[[How to import the sparse data of vsm to weka]]
 
*[[How to import the sparse data of vsm to weka]]
  
==Document classification of Sougou data ==
+
==Test==
* DATA
+
[[Sougou data]]
:* Data from SougouLab [http://www.sogou.com/labs/dl/c.html],using SogouC.reduced(30M)
+
:* 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
+
:* train and test: train(),test(),dev()
+
*Text preprocessing
+
:* Segment word using wordlist of 9W.(tencent)
+
:* Remove stop word.stop_wordlist is
+
:*
+
*Some Tools
+
:* weka
+
:* scw
+
:* google word2ve
+
:* LDA
+
===VSM Test===
+
*Data
+
:* dimension:9402
+
*Method
+
:* document reprenstion: use the tf-idf weight for word weight
+
:* classifier: Native Bayes
+
*Result
+
 
+
{| border="2px"
+
|+ classification result
+
|-
+
! Training Set !! 财经!! IT!! 健康!! 体育!! 旅游 !!教育 !! 招聘!! 文化!!军事
+
|-
+
! TFIDF
+
| 0.678 || 0.718 || 0.708 || 0.708 || 0.73
+
|-
+
|}
+
 
+
===LDA Test===
+
===Word2vec Test===
+

2014年9月28日 (日) 11:33的最后版本

Problem And Solve

Test

Sougou data