“Document classification test”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Word2vec Test
Lr讨论 | 贡献
Document classification of Sougou data
 
(相同用户的12个中间修订版本未显示)
第2行: 第2行:
 
*[[How to import the sparse data of vsm to weka]]
 
*[[How to import the sparse data of vsm to weka]]
  
==Document classification of Sougou data ==
+
==Test==
* DATA
+
[[Sougou data]]
:* Data from SougouLab [http://www.sogou.com/labs/dl/c.html],using SogouC.reduced(30M)
+
:* 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
+
:* train and test: train(),test(),dev()
+
*Text preprocessing
+
:* Segment word using wordlist of 9W.(tencent)
+
:* Remove stop word.stop_wordlist is
+
:*
+
*Some Tools
+
:* weka
+
:* scw
+
:* google word2ve
+
:* LDA
+
*class map
+
C000007 汽车
+
C000008 财经
+
C000010 IT
+
C000013 健康
+
C000014 体育
+
C000016 旅游
+
C000020 教育
+
C000022 招聘
+
C000023 文化
+
C000024 军事
+
===VSM Test===
+
*Data
+
:* dimension:9402
+
*Method
+
:* document reprenstion: use the tf-idf weight for word weight
+
:* classifier: Native Bayes
+
*Result
+
 
+
{| border="2px"
+
|+ classification result
+
|-
+
!  !! 财经!! IT!! 健康!! 体育!! 旅游 !!教育 !! 招聘!! 文化!!军事!!sum
+
|-
+
! ACC-test
+
|  0.72139 || 0.72139 || 0.75124 || 0.82089 || 0.79602 || 0.61194 || 0.70647 || 0.64179|| 0.79104 || 0.72913
+
|-
+
! ACC-train
+
| 0.678 || 0.718 || 0.708 || 0.708 || 0.73
+
|-
+
|}
+
 
+
===LDA Test===
+
===Word2vec Test===
+
*Word2vec result
+
 
+
{| border="2px"
+
|+ classification result Of ACC in different dimension
+
|-
+
! Dimension  !! 财经!! IT!! 健康!! 体育!! 旅游 !!教育 !! 招聘!! 文化!!军事!!sum
+
|-
+
! 10
+
| 0.766169154|| 0.383084577|| 0.52238806|| 0.820895522|| 0.666666667|| 0.44278607|| 0.567164179|| 0.721393035|| 0.850746269|| 0.637921504
+
|-
+
!20
+
|0.781094527 0.537313433 0.572139303 0.830845771 0.76119403 0.452736318 0.611940299 0.646766169 0.860696517 0.672747374
+
|-
+
!30
+
|0.815920398 0.671641791 0.606965174 0.835820896 0.766169154 0.552238806 0.577114428 0.68159204 0.885572139 0.710337203
+
|-
+
!40
+
|0.7960199 0.68159204 0.631840796 0.805970149 0.756218905 0.572139303 0.577114428 0.701492537 0.905472637 0.714206744
+
|-
+
!50
+
|0.805970149 0.691542289 0.641791045 0.800995025 0.751243781 0.552238806 0.651741294 0.656716418 0.910447761 0.718076285
+
|-
+
!60
+
|0.7960199 0.68159204 0.626865672 0.776119403 0.736318408 0.572139303 0.626865672 0.651741294 0.895522388 0.707020453
+
|-
+
!70
+
|0.7960199 0.701492537 0.621890547 0.781094527 0.771144279 0.572139303 0.631840796 0.656716418 0.905472637 0.715312327
+
|-
+
!80
+
|0.7960199 0.686567164 0.626865672 0.805970149 0.776119403 0.582089552 0.631840796 0.676616915 0.905472637 0.720840243
+
|-
+
!90
+
|0.805970149 0.71641791 0.621890547 0.776119403 0.766169154 0.572139303 0.646766169 0.666666667 0.915422886 0.720840243
+
|-
+
!100
+
|0.776119403 0.706467662 0.631840796 0.751243781 0.786069652 0.577114428 0.646766169 0.666666667 0.910447761 0.716970702
+
|-
+
!110
+
|0.771144279 0.71641791 0.656716418 0.741293532 0.76119403 0.597014925 0.606965174 0.691542289 0.910447761 0.716970702
+
|-
+
!120
+
|0.76119403 0.71641791 0.646766169 0.756218905 0.766169154 0.60199005 0.661691542 0.686567164 0.915422886 0.723604201
+
|-
+
!130
+
|0.776119403 0.731343284 0.631840796 0.76119403 0.771144279 0.577114428 0.626865672 0.701492537 0.905472637 0.720287452
+
|-
+
!140
+
|0.76119403 0.746268657 0.63681592 0.736318408 0.786069652 0.587064677 0.651741294 0.68159204 0.900497512 0.720840243
+
|-
+
!150
+
|0.756218905 0.726368159 0.63681592 0.736318408 0.771144279 0.611940299 0.651741294 0.686567164 0.910447761 0.720840243
+
|-
+
!160
+
|0.751243781 0.71641791 0.646766169 0.731343284 0.776119403 0.597014925 0.651741294 0.696517413 0.895522388 0.718076285
+
|-
+
!170
+
|0.756218905 0.741293532 0.661691542 0.731343284 0.766169154 0.60199005 0.651741294 0.666666667 0.900497512 0.71973466
+
|-
+
!180
+
|0.781094527 0.731343284 0.651741294 0.736318408 0.781094527 0.606965174 0.631840796 0.676616915 0.895522388 0.721393035
+
|-
+
!190
+
|0.771144279 0.726368159 0.661691542 0.731343284 0.766169154 0.60199005 0.631840796 0.706467662 0.900497512 0.721945826
+
|-
+
!200
+
|0.771144279 0.736318408 0.641791045 0.706467662 0.771144279 0.606965174 0.611940299 0.71641791 0.900497512 0.718076285
+
|-
+
|}
+

2014年9月28日 (日) 11:33的最后版本

Problem And Solve

Test

Sougou data