Document classification test
来自cslt Wiki
Document classification of Sougou data
- DATA
- Data from SougouLab [1],using SogouC.reduced(30M)
- 9-Classes:财经,IT,健康,体育,旅游,教育,招聘,文化,军事
- train and test: train(),test(0,dev()
- Text preprocessing
- Segment word using wordlist of 9W.(tencent)
- Remove stop word.stop_wordlist is