“NLP Status Report 2017-3-13”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“{| class="wikitable" !Date !! People !! Last Week !! This Week |- | rowspan="6"|2017/1/3 |Yang Feng || * ran experiments on the CS-EN data set (200k pairs) with tota...”为内容创建页面)
 
 
(3位用户的5个中间修订版本未显示)
第4行: 第4行:
 
| rowspan="6"|2017/1/3
 
| rowspan="6"|2017/1/3
 
|Yang Feng ||
 
|Yang Feng ||
* ran experiments on the CS-EN data set (200k pairs) with totally identical initialization as the paper. on the sampled 2k training sentences, the bleu is 19.5 (not converged yet). (the bleu on the test set expected to be 26)
+
* tested and analyzed the results on the cs-en data set (30.4 on the heldout-training set and 7.3 on the dev set);
* add the alpha and gamma score and do multi-task training. Without multi-task training, the loss didn't decline on the training data, but with multi-task training, the loss did decline.
+
* added masks to the baseline (44.4 on the cn-en);
* prepared for Huilan's inspection.
+
* added encoder-masks and memory-masks to alpha-gamma method and fixed the bugs. Got an improvement of 0.5 again the masked baseline [[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b8/Nmt_mn_report_continue.pdf report]];
 +
* To avoid doing softmax twice, rewrite the softmax_cross_entropy function myself. (under-training)
 
||
 
||
* Analyze the reason that the loss didn't decline with alpha and gamma
+
* analyze and improve the alpha-gamma method.
* test for multi-task training;
+
* improve the baseline for the CS-EN
+
 
|-
 
|-
 
|Jiyuan Zhang ||
 
|Jiyuan Zhang ||
* reproduced planning neural network [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/38/Planning_neural_network_initial_decode.pdf results]
+
*completed to reproduce planning neural network
 +
*chose best attention_memory model for huilian  and ran big train dataset(about 370k) [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b9/Model_with_different_dataset.pdf result]
 +
 
 
||  
 
||  
*reproduce planning neural network
+
*Keyword expansion model
 +
*collect more poem from Internet
 +
*recruiting
 +
 
 
|-
 
|-
 
|Andi Zhang ||
 
|Andi Zhang ||
* added source masks in attention_decoder where calculates attention and in gru_cell where calculates new states.
+
*ran baseline without mask, found that the model with masks has a slightly better bleu score.
* found the attribute sentence_length, perhaps it works better than my code
+
*tried a way to deal oov words; but it can't predict '_EOS' symbol
 
||
 
||
 
+
*try to fix the problem
 
|-
 
|-
 
|Shiyue Zhang ||  
 
|Shiyue Zhang ||  
* figured out the problem of attention: the initial value of V should be around 0
+
* added trained memory-attention model to neural model(43.0) and got 2+ blue gain (45.19), but need more validation and improvement
* tested different modification, such as add mask, init b with 0.  
+
* ran baseline model on cs-en data, and found it was good on train set but poor on test set.
* Compared the results, and concluded only change the initial value of V is the best.
+
* ran baseline model on en-fr data, and found 'inf' problem.
 +
* fixed the 'inf' problem by debugging the code of mask-added baseline model.
 +
* running on cs-en and en-fr data again.
 
||
 
||
* try to get right attention on memory
+
* go on with baseline on big data: get results of cs-en and enfr data, train on zh-en data from [http://www.statmt.org/wmt17/translation-task.html#download WMT17]
 +
* go on to refine memory attention model: retrain to find out if the 2+ is just by chance, try more memory attention structure (relu, a(t-1), y(t-1)...)
 
|-
 
|-
 
|Peilun Xiao ||
 
|Peilun Xiao ||

2017年3月14日 (二) 01:46的最后版本

Date People Last Week This Week
2017/1/3 Yang Feng
  • tested and analyzed the results on the cs-en data set (30.4 on the heldout-training set and 7.3 on the dev set);
  • added masks to the baseline (44.4 on the cn-en);
  • added encoder-masks and memory-masks to alpha-gamma method and fixed the bugs. Got an improvement of 0.5 again the masked baseline [report];
  • To avoid doing softmax twice, rewrite the softmax_cross_entropy function myself. (under-training)
  • analyze and improve the alpha-gamma method.
Jiyuan Zhang
  • completed to reproduce planning neural network
  • chose best attention_memory model for huilian and ran big train dataset(about 370k) result
  • Keyword expansion model
  • collect more poem from Internet
  • recruiting
Andi Zhang
  • ran baseline without mask, found that the model with masks has a slightly better bleu score.
  • tried a way to deal oov words; but it can't predict '_EOS' symbol
  • try to fix the problem
Shiyue Zhang
  • added trained memory-attention model to neural model(43.0) and got 2+ blue gain (45.19), but need more validation and improvement
  • ran baseline model on cs-en data, and found it was good on train set but poor on test set.
  • ran baseline model on en-fr data, and found 'inf' problem.
  • fixed the 'inf' problem by debugging the code of mask-added baseline model.
  • running on cs-en and en-fr data again.
  • go on with baseline on big data: get results of cs-en and enfr data, train on zh-en data from WMT17
  • go on to refine memory attention model: retrain to find out if the 2+ is just by chance, try more memory attention structure (relu, a(t-1), y(t-1)...)
Peilun Xiao