“Ling Luo 2015-08-31”版本间的差异
来自cslt Wiki
(→Works in this week:) |
(→Works in this week:) |
||
(相同用户的4个中间修订版本未显示) | |||
第4行: | 第4行: | ||
1.''Finish training word embeddings via 5 models :'' | 1.''Finish training word embeddings via 5 models :'' | ||
+ | |||
using EnWiki dataset(953M): | using EnWiki dataset(953M): | ||
+ | |||
CBOW,Skip-Gram | CBOW,Skip-Gram | ||
+ | |||
using text8 dataset(95.3M): | using text8 dataset(95.3M): | ||
+ | |||
CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based) | CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based) | ||
2.''Use tasks to measure quality of the word vectors with various dimensions(10~200):'' | 2.''Use tasks to measure quality of the word vectors with various dimensions(10~200):'' | ||
− | word similarity | + | |
+ | word similarity | ||
+ | |||
the TOEFL set:small dataset | the TOEFL set:small dataset | ||
+ | |||
analogy task:9K semantic and 10.5K syntactic analogy questions | analogy task:9K semantic and 10.5K syntactic analogy questions | ||
+ | |||
text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings | text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings | ||
+ | |||
sentence-level sentiment classification (based on convolutional neural networks) | sentence-level sentiment classification (based on convolutional neural networks) | ||
− | |||
+ | part-of-speech tagging | ||
== Works in this week: == | == Works in this week: == | ||
− | + | semantic&syntactic analogy: | |
− | try to use different similarity calculation | + | try to use different similarity calculation methods |
− | named entity recognition | + | named entity recognition |
focus on cnn | focus on cnn |
2015年9月2日 (三) 02:24的最后版本
Works in the past:
1.Finish training word embeddings via 5 models :
using EnWiki dataset(953M):
CBOW,Skip-Gram
using text8 dataset(95.3M):
CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based)
2.Use tasks to measure quality of the word vectors with various dimensions(10~200):
word similarity
the TOEFL set:small dataset
analogy task:9K semantic and 10.5K syntactic analogy questions
text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings
sentence-level sentiment classification (based on convolutional neural networks)
part-of-speech tagging
Works in this week:
semantic&syntactic analogy: try to use different similarity calculation methods
named entity recognition
focus on cnn