Dongxu Zhang 2016-03-07

来自cslt Wiki
跳转至: 导航搜索

Last Week


  • Similar pair with sampling method to create more training data.
  • compare different strategies.(attention input)
 1. use source sentence pairs to create more data. and then sample co-occurred words with similar pair map. 
    Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon 
    map to <unk>. (create a lot of <unk> when decoding)
 2. sample co-occurred words with similar pairs. Other words in the core lexicon including zhuci, tanci, jieci 
    map to themselves and words out of lexicon map to <unk>. (can alignment in some extent.)
 3. sample all the words except zhuci, tanci, jieci. (worse than 2. in alignment performance)
 4. sample words out of lexicon, and map words in the lexicon to themselves.(competitive to 2.)
  • Ask Bingdong for POS corpus.

This Week


  • split fujitsu data into training and testing data. And evaluate with hit@1
  • evaluate 2. 3. 4. model.
  • evaluate model that only mapping nouns if Bingdong provide the POS corpus.
  • attention both input and hidden layer. And try to initialize word embedding matrix with word vector.