|
|
(相同用户的10个中间修订版本未显示) |
第2行: |
第2行: |
| *[[How to import the sparse data of vsm to weka]] | | *[[How to import the sparse data of vsm to weka]] |
| | | |
− | ==Document classification of Sougou data == | + | ==Test== |
− | * DATA
| + | [[Sougou data]] |
− | :* Data from SougouLab [http://www.sogou.com/labs/dl/c.html],using SogouC.reduced(30M)
| + | |
− | :* 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
| + | |
− | :* train and test: train(),test(),dev()
| + | |
− | *Text preprocessing
| + | |
− | :* Segment word using wordlist of 9W.(tencent)
| + | |
− | :* Remove stop word.stop_wordlist is
| + | |
− | :*
| + | |
− | *Some Tools
| + | |
− | :* weka
| + | |
− | :* scw
| + | |
− | :* google word2ve
| + | |
− | :* LDA
| + | |
− | *class map
| + | |
− | C000007 汽车
| + | |
− | C000008 财经
| + | |
− | C000010 IT
| + | |
− | C000013 健康
| + | |
− | C000014 体育
| + | |
− | C000016 旅游
| + | |
− | C000020 教育
| + | |
− | C000022 招聘
| + | |
− | C000023 文化
| + | |
− | C000024 军事
| + | |
− | ===VSM Test===
| + | |
− | *Data
| + | |
− | :* dimension:9402
| + | |
− | *Method
| + | |
− | :* document reprenstion: use the tf-idf weight for word weight
| + | |
− | :* classifier: Native Bayes
| + | |
− | *Result
| + | |
− | | + | |
− | {| border="2px"
| + | |
− | |+ classification result
| + | |
− | |-
| + | |
− | ! !! 财经!! IT!! 健康!! 体育!! 旅游 !!教育 !! 招聘!! 文化!!军事!!sum
| + | |
− | |-
| + | |
− | ! ACC-test
| + | |
− | | 0.72139 || 0.72139 || 0.75124 || 0.82089 || 0.79602 || 0.61194 || 0.70647 || 0.64179|| 0.79104 || 0.72913
| + | |
− | |-
| + | |
− | ! ACC-train
| + | |
− | | 0.678 || 0.718 || 0.708 || 0.708 || 0.73
| + | |
− | |-
| + | |
− | |}
| + | |
− | | + | |
− | ===LDA Test===
| + | |
− | ===Word2vec Test===
| + | |
− | *Word2vec result
| + | |
− | :* Dimension
| + | |
− | | + | |
− | {| border="2px"
| + | |
− | |+ classification result Of ACC in different dimension
| + | |
− | |-
| + | |
− | ! Dimension !! 财经!! IT!! 健康!! 体育!! 旅游 !!教育 !! 招聘!! 文化!!军事!!sum
| + | |
− | |-
| + | |
− | ! 10
| + | |
− | | 0.766169154|| 0.383084577|| 0.52238806|| 0.820895522|| 0.666666667|| 0.44278607|| 0.567164179|| 0.721393035|| 0.850746269|| 0.637921504
| + | |
− | |-
| + | |
− | !20
| + | |
− | |0.781094527|| 0.537313433|| 0.572139303|| 0.830845771|| 0.76119403|| 0.452736318|| 0.611940299|| 0.646766169|| 0.860696517|| 0.672747374
| + | |
− | |-
| + | |
− | !30
| + | |
− | |0.815920398|| 0.671641791|| 0.606965174|| 0.835820896|| 0.766169154|| 0.552238806|| 0.577114428|| 0.68159204|| 0.885572139|| 0.710337203
| + | |
− | |-
| + | |
− | !40
| + | |
− | |0.7960199|| 0.68159204|| 0.631840796|| 0.805970149|| 0.756218905|| 0.572139303|| 0.577114428|| 0.701492537|| 0.905472637|| 0.714206744
| + | |
− | |-
| + | |
− | !50
| + | |
− | |0.805970149|| 0.691542289|| 0.641791045|| 0.800995025|| 0.751243781|| 0.552238806|| 0.651741294|| 0.656716418|| 0.910447761|| 0.718076285
| + | |
− | |-
| + | |
− | !60
| + | |
− | |0.7960199|| 0.68159204|| 0.626865672|| 0.776119403|| 0.736318408|| 0.572139303|| 0.626865672|| 0.651741294|| 0.895522388|| 0.707020453
| + | |
− | |-
| + | |
− | !70
| + | |
− | |0.7960199|| 0.701492537|| 0.621890547|| 0.781094527|| 0.771144279|| 0.572139303|| 0.631840796|| 0.656716418|| 0.905472637|| 0.715312327
| + | |
− | |-
| + | |
− | !80
| + | |
− | |0.7960199|| 0.686567164|| 0.626865672|| 0.805970149|| 0.776119403|| 0.582089552|| 0.631840796|| 0.676616915|| 0.905472637|| 0.720840243
| + | |
− | |-
| + | |
− | !90
| + | |
− | |0.805970149|| 0.71641791|| 0.621890547|| 0.776119403|| 0.766169154|| 0.572139303|| 0.646766169|| 0.666666667|| 0.915422886|| 0.720840243
| + | |
− | |-
| + | |
− | !100
| + | |
− | |0.776119403|| 0.706467662|| 0.631840796|| 0.751243781|| 0.786069652|| 0.577114428|| 0.646766169|| 0.666666667|| 0.910447761|| 0.716970702
| + | |
− | |-
| + | |
− | !110
| + | |
− | |0.771144279|| 0.71641791|| 0.656716418|| 0.741293532|| 0.76119403|| 0.597014925|| 0.606965174|| 0.691542289|| 0.910447761|| 0.716970702
| + | |
− | |-
| + | |
− | !120
| + | |
− | |0.76119403|| 0.71641791|| 0.646766169|| 0.756218905|| 0.766169154|| 0.60199005|| 0.661691542|| 0.686567164|| 0.915422886|| 0.723604201
| + | |
− | |-
| + | |
− | !130
| + | |
− | |0.776119403|| 0.731343284|| 0.631840796|| 0.76119403|| 0.771144279|| 0.577114428|| 0.626865672|| 0.701492537|| 0.905472637|| 0.720287452
| + | |
− | |-
| + | |
− | !140
| + | |
− | |0.76119403|| 0.746268657|| 0.63681592|| 0.736318408|| 0.786069652|| 0.587064677|| 0.651741294|| 0.68159204|| 0.900497512|| 0.720840243
| + | |
− | |-
| + | |
− | !150
| + | |
− | |0.756218905|| 0.726368159|| 0.63681592|| 0.736318408|| 0.771144279|| 0.611940299|| 0.651741294|| 0.686567164|| 0.910447761|| 0.720840243
| + | |
− | |-
| + | |
− | !160
| + | |
− | |0.751243781|| 0.71641791|| 0.646766169|| 0.731343284|| 0.776119403|| 0.597014925|| 0.651741294|| 0.696517413|| 0.895522388|| 0.718076285
| + | |
− | |-
| + | |
− | !170
| + | |
− | |0.756218905|| 0.741293532|| 0.661691542|| 0.731343284|| 0.766169154|| 0.60199005|| 0.651741294|| 0.666666667|| 0.900497512|| 0.71973466
| + | |
− | |-
| + | |
− | !180
| + | |
− | |0.781094527|| 0.731343284|| 0.651741294|| 0.736318408|| 0.781094527|| 0.606965174|| 0.631840796|| 0.676616915|| 0.895522388|| 0.721393035
| + | |
− | |-
| + | |
− | !190
| + | |
− | |0.771144279|| 0.726368159|| 0.661691542|| 0.731343284|| 0.766169154|| 0.60199005|| 0.631840796|| 0.706467662|| 0.900497512|| 0.721945826
| + | |
− | |-
| + | |
− | !200
| + | |
− | |0.771144279|| 0.736318408|| 0.641791045|| 0.706467662|| 0.771144279|| 0.606965174|| 0.611940299|| 0.71641791|| 0.900497512|| 0.718076285
| + | |
− | |-
| + | |
− | |}
| + | |