“NLP Status Report 2017-5-22”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(3位用户的7个中间修订版本未显示)
第2行: 第2行:
 
!Date !! People !! Last Week !! This Week
 
!Date !! People !! Last Week !! This Week
 
|-
 
|-
| rowspan="6"|2017/4/5
+
| rowspan="6"|2017/5/22
 
|Jiyuan Zhang ||
 
|Jiyuan Zhang ||
 
||  
 
||  
 
|-
 
|-
 
|Aodong LI ||
 
|Aodong LI ||
 
+
* bleu of baseline = 43.87
 +
* 2nd translator uses as training data the concat(Chinese, machine translated English):
 +
  hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)
 +
  hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1
 +
  hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)
 +
* double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):
 +
  bleu = 40.11 (best)
 +
  The 1st decoder's output is generally better than 2nd decoder's output.
 +
* The training process of double-decoder model '''without''' joint loss is problematic.
 
||
 
||
 +
* Replace the forced teaching mechanism in training process with beam search mechanism.
 
|-
 
|-
 
|Shiyue Zhang ||  
 
|Shiyue Zhang ||  
第19行: 第28行:
 
|-
 
|-
 
|Shipan Ren ||
 
|Shipan Ren ||
* learn the implement of seq2seq model
+
* learned the implement of seq2seq model
 
* read tf_translate code
 
* read tf_translate code
 
||
 
||
第26行: 第35行:
 
|-
 
|-
 
      
 
      
 
||
 
  
 
|}
 
|}

2017年5月31日 (三) 09:03的最后版本

Date People Last Week This Week
2017/5/22 Jiyuan Zhang
Aodong LI
  • bleu of baseline = 43.87
  • 2nd translator uses as training data the concat(Chinese, machine translated English):
 hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)
 hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1
 hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)
  • double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):
 bleu = 40.11 (best)
 The 1st decoder's output is generally better than 2nd decoder's output.
  • The training process of double-decoder model without joint loss is problematic.
  • Replace the forced teaching mechanism in training process with beam search mechanism.
Shiyue Zhang
  • tried to not train embedding but use external word vectors
  • most results of my attempts are bad, only 3-layer rnn + no dropout model got 25.54 bleu which about 2 points worse than original baseline
  • trained original baseline on new data ( the data fixed the reverse sentence problem), got bleu=27.88; moses bleu=32.47
  • try more models to get similar results as original baseline on new data
  • m-nmt model on new data
Shipan Ren
  • learned the implement of seq2seq model
  • read tf_translate code
  • understand the meaning of main code
  • start writing documents