“Dongxu Zhang 2016-03-07”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“Last Week ---- * Similar pair with sampling method to create more training data. * compare different strategies.(attention input) 1. use source sentence pairs to c...”为内容创建页面)
 
 
第3行: 第3行:
 
* Similar pair with sampling method to create more training data.
 
* Similar pair with sampling method to create more training data.
 
* compare different strategies.(attention input)
 
* compare different strategies.(attention input)
   1. use source sentence pairs to create more data. and then sample co-occurred words with similar pair map. Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon map to <unk>. (create a lot of <unk> when decoding)
+
   1. use source sentence pairs to create more data. and then sample co-occurred words with similar pair map.  
   2. sample co-occurred words with similar pairs. Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon map to <unk>. (can alignment in some extent.)
+
    Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon  
 +
    map to <unk>. (create a lot of <unk> when decoding)
 +
   2. sample co-occurred words with similar pairs. Other words in the core lexicon including zhuci, tanci, jieci  
 +
    map to themselves and words out of lexicon map to <unk>. (can alignment in some extent.)
 
   3. sample all the words except zhuci, tanci, jieci. (worse than 2. in alignment performance)
 
   3. sample all the words except zhuci, tanci, jieci. (worse than 2. in alignment performance)
 
   4. sample words out of lexicon, and map words in the lexicon to themselves.(competitive to 2.)
 
   4. sample words out of lexicon, and map words in the lexicon to themselves.(competitive to 2.)

2016年3月7日 (一) 01:25的最后版本

Last Week


  • Similar pair with sampling method to create more training data.
  • compare different strategies.(attention input)
 1. use source sentence pairs to create more data. and then sample co-occurred words with similar pair map. 
    Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon 
    map to <unk>. (create a lot of <unk> when decoding)
 2. sample co-occurred words with similar pairs. Other words in the core lexicon including zhuci, tanci, jieci 
    map to themselves and words out of lexicon map to <unk>. (can alignment in some extent.)
 3. sample all the words except zhuci, tanci, jieci. (worse than 2. in alignment performance)
 4. sample words out of lexicon, and map words in the lexicon to themselves.(competitive to 2.)
  • Ask Bingdong for POS corpus.

This Week


  • split fujitsu data into training and testing data. And evaluate with hit@1
  • evaluate 2. 3. 4. model.
  • evaluate model that only mapping nouns if Bingdong provide the POS corpus.
  • attention both input and hidden layer. And try to initialize word embedding matrix with word vector.