“Ling Luo 2015-08-31”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Works in this week:
Works in this week:
 
(相同用户的4个中间修订版本未显示)
第4行: 第4行:
  
 
1.''Finish training word embeddings via 5 models :''
 
1.''Finish training word embeddings via 5 models :''
 +
 
using EnWiki dataset(953M):
 
using EnWiki dataset(953M):
 +
 
CBOW,Skip-Gram
 
CBOW,Skip-Gram
 +
 
using text8 dataset(95.3M):
 
using text8 dataset(95.3M):
 +
 
CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based)
 
CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based)
  
 
2.''Use tasks to measure quality of the word vectors with various dimensions(10~200):''
 
2.''Use tasks to measure quality of the word vectors with various dimensions(10~200):''
word similarity(ws)
+
 
 +
word similarity
 +
 
 
the TOEFL set:small dataset
 
the TOEFL set:small dataset
 +
 
analogy task:9K semantic and 10.5K syntactic analogy questions
 
analogy task:9K semantic and 10.5K syntactic analogy questions
 +
 
text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings
 
text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings
 +
 
sentence-level sentiment classification (based on convolutional neural networks)
 
sentence-level sentiment classification (based on convolutional neural networks)
part-of-speech tagging
 
  
 +
part-of-speech tagging
  
 
== Works in this week: ==
 
== Works in this week: ==
  
word similarity(ws):
+
semantic&syntactic analogy:
try to use different similarity calculation method
+
try to use different similarity calculation methods
  
named entity recognition(ner)
+
named entity recognition
  
 
focus on cnn
 
focus on cnn

2015年9月2日 (三) 02:24的最后版本

Works in the past:

1.Finish training word embeddings via 5 models :

using EnWiki dataset(953M):

CBOW,Skip-Gram

using text8 dataset(95.3M):

CBOW,Skip-Gram,C&W,GloVe,LBL and Order(count-based)

2.Use tasks to measure quality of the word vectors with various dimensions(10~200):

word similarity

the TOEFL set:small dataset

analogy task:9K semantic and 10.5K syntactic analogy questions

text classification:IMDB dataset——pos&neg,use unlabeled dataset to train word embeddings

sentence-level sentiment classification (based on convolutional neural networks)

part-of-speech tagging

Works in this week:

semantic&syntactic analogy: try to use different similarity calculation methods

named entity recognition

focus on cnn