Dongxu Zhang 2016-03-07

Last Week

Similar pair with sampling method to create more training data.
compare different strategies.(attention input)

 1. use source sentence pairs to create more data. and then sample co-occurred words with similar pair map. Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon map to <unk>. (create a lot of <unk> when decoding)
 2. sample co-occurred words with similar pairs. Other words in the core lexicon including zhuci, tanci, jieci map to themselves and words out of lexicon map to <unk>. (can alignment in some extent.)
 3. sample all the words except zhuci, tanci, jieci. （worse than 2. in alignment performance）
 4. sample words out of lexicon, and map words in the lexicon to themselves.(competitive to 2.)

Ask Bingdong for POS corpus.

This Week

split fujitsu data into training and testing data. And evaluate with hit@1
evaluate 2. 3. 4. model.
evaluate model that only mapping nouns if Bingdong provide the POS corpus.
attention both input and hidden layer. And try to initialize word embedding matrix with word vector.

Dongxu Zhang 2016-03-07

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具