Ling Luo 2015-08-31
来自cslt Wiki
Works in the past:
1.Finish training word embeddings via 5 models :
using EnWiki dataset(953M):
CBOW,Skip-Gram
using text8 dataset(95.3M):
CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based)
2.Use tasks to measure quality of the word vectors with various dimensions(10~200):
word similarity(ws)
the TOEFL set:small dataset
analogy task:9K semantic and 10.5K syntactic analogy questions
text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings
sentence-level sentiment classification (based on convolutional neural networks)
part-of-speech tagging
Works in this week:
word similarity(ws): try to use different similarity calculation method
named entity recognition(ner)
focus on cnn