“Document classification test”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
VSM Test
Lr讨论 | 贡献
Document classification of Sougou data
第8行: 第8行:
 
:* Remove stop word.stop_wordlist is  
 
:* Remove stop word.stop_wordlist is  
 
:*
 
:*
 +
*Some Tools
 +
:* weka
 +
:* scw
 +
:* google word2ve
 +
:* LDA
 
===VSM Test===
 
===VSM Test===
 
*Data
 
*Data

2014年9月7日 (日) 13:48的版本

Document classification of Sougou data

  • DATA
  • Data from SougouLab [1],using SogouC.reduced(30M)
  • 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
  • train and test: train(),test(),dev()
  • Text preprocessing
  • Segment word using wordlist of 9W.(tencent)
  • Remove stop word.stop_wordlist is
  • Some Tools
  • weka
  • scw
  • google word2ve
  • LDA

VSM Test

  • Data
  • dimension:9402
  • Method
  • document reprenstion: use the tf-idf weight for word weight
  • classifier: Native Bayes
  • Result

LDA Test

Word2vec Test