“NLP Status Report 2017-5-31”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第7行: 第7行:
 
|-
 
|-
 
|Aodong LI ||
 
|Aodong LI ||
 
+
* code double-attention model with '''final_attn = alpha * attn_ch + beta * attn_en'''
 +
* baseline bleu = '''43.87'''
 +
* experiments with '''random''' initialized embedding:
 +
{| class="wikitable"
 +
|-
 +
! alpha
 +
! beta
 +
! result (bleu)
 +
|-
 +
| 1
 +
| 1
 +
| 43.50
 +
|-
 +
| 4/3
 +
| 2/3
 +
| 43.58 (w/o retrained)
 +
|-
 +
| 2/3
 +
| 4/3
 +
| 42.22 (w/o retrained)
 +
|-
 +
| 2/3
 +
| 4/3
 +
| 42.36 (w/ retrained)
 +
|}
 +
* experiments with '''constant''' initialized embedding:
 +
{| class="wikitable"
 +
|-
 +
! alpha
 +
! beta
 +
! result (bleu)
 +
|-
 +
| 1
 +
| 1
 +
| '''45.41'''
 +
|-
 +
| 4/3
 +
| 2/3
 +
| '''45.79'''
 +
|-
 +
| 2/3
 +
| 4/3
 +
| '''45.32'''
 +
|}
 +
* This model is similar to multi-source neural translation but uses less resource
 
||
 
||
 
+
* Explore different attention merge strategies
 +
* Explore hierarchical model
 
|-
 
|-
 
|Shiyue Zhang ||  
 
|Shiyue Zhang ||  

2017年5月31日 (三) 04:42的版本

Date People Last Week This Week
2017/5/31 Jiyuan Zhang
Aodong LI
  • code double-attention model with final_attn = alpha * attn_ch + beta * attn_en
  • baseline bleu = 43.87
  • experiments with random initialized embedding:
alpha beta result (bleu)
1 1 43.50
4/3 2/3 43.58 (w/o retrained)
2/3 4/3 42.22 (w/o retrained)
2/3 4/3 42.36 (w/ retrained)
  • experiments with constant initialized embedding:
alpha beta result (bleu)
1 1 45.41
4/3 2/3 45.79
2/3 4/3 45.32
  • This model is similar to multi-source neural translation but uses less resource
  • Explore different attention merge strategies
  • Explore hierarchical model
Shiyue Zhang
  • found dropout bug, fix it, and reran baseline: baseline 35.21, baseline(outproj=emb) 35.24
  • tried several embed set models, failed
  • embedded other words to model embedding space (trained on train data not big data), and then directly used in baseline(outproj=emb)
30000 50000 70000 90000
35.24 34.52 33.73 33.16
4564 (6666) 4535 4469 4426
  • m-nmt is running
  • get word2vec on big data, and compare with word2vec from train data
  • test m-nmt model, increase vocab size and test
  • review zh-uy/uy-zh related works, start to write paper
Shipan Ren