“Nlp tool”版本间的差异
来自cslt Wiki
第1行: | 第1行: | ||
+ | =Nlp tool= | ||
* Text Analysis Online[http://textanalysisonline.com/] | * Text Analysis Online[http://textanalysisonline.com/] | ||
:*提供了一个在线的测试环境。目前提供NLTK(内含斯坦福NLP),TextBlob, MBSP, Pattern, PyTeaser, LangId和中文分词等多种工具 | :*提供了一个在线的测试环境。目前提供NLTK(内含斯坦福NLP),TextBlob, MBSP, Pattern, PyTeaser, LangId和中文分词等多种工具 | ||
第17行: | 第18行: | ||
* The Dragon Toolkit[http://dragon.ischool.drexel.edu/] | * The Dragon Toolkit[http://dragon.ischool.drexel.edu/] | ||
:* The Dragon Toolkit is a Java-based development package for academic use in information retrieval (IR) and text mining (TM, including text classification, text clustering, text summarization, and topic modeling). | :* The Dragon Toolkit is a Java-based development package for academic use in information retrieval (IR) and text mining (TM, including text classification, text clustering, text summarization, and topic modeling). | ||
+ | =word2vec= | ||
+ | * word2vec tool | ||
+ | :* word vector tool for text classification, text clustering or information retrieval[http://sourceforge.net/projects/wvtool/] | ||
+ | :* google word2ve[http://code.google.com/p/word2vec/] | ||
+ | * document vector[http://radimrehurek.com/2014/12/doc2vec-tutorial/?utm_source=rss&utm_medium=rss&utm_campaign=doc2vec-tutorial] | ||
+ | :* genSim[https://github.com/piskvorky/gensim/] new function | ||
+ | * Deep Learning for Java[http://deeplearning4j.org/] | ||
+ | :* word2vec[http://deeplearning4j.org/word2vec.html] |
2014年12月25日 (四) 05:16的版本
Nlp tool
- Text Analysis Online[1]
- 提供了一个在线的测试环境。目前提供NLTK(内含斯坦福NLP),TextBlob, MBSP, Pattern, PyTeaser, LangId和中文分词等多种工具
- c/c++编写
- openNLP[4]
- 标记化、句子分割、词性标注、固有实体提取(指在句子中辨认出专有名词,例如:人名)、浅层分析(句字分块)、语法分析及指代
- 最近有更新,采用java
- stanford nlp[7]
- 部分功能支持中文,可以自己训练
- NLTK[8]
- 可以用作学习,python
- ICTCLAS [9]
- 中文分词;词性标注;命名实体识别;用户词典功能;支持GBK编码、UTF8编码、BIG5编码。新增微博分词、新词发现与关键词提取
- lingpipe 是alias公司开发的一款自然语言处理软件包,包括主题分类、句题检测、字符语言建模等十余个模块。而且文档完整,甚至每一个算法都有论文参考。更难能可贵的是它支持中文。官方地址:http://alias-i.com/lingpipe/ 下载地址:http://alias-i.com/lingpipe/web/download.html LingPipe分为两个大块,一块是LingPipe核心文件,另外一块是LingPipe的模型类。需要支持中文的话需要下载Chinese Word Segmentation模块
- The Dragon Toolkit[12]
- The Dragon Toolkit is a Java-based development package for academic use in information retrieval (IR) and text mining (TM, including text classification, text clustering, text summarization, and topic modeling).
word2vec
- word2vec tool
- document vector[15]
- genSim[16] new function
- Deep Learning for Java[17]
- word2vec[18]