“2017-3-6”版本间的差异

2017年3月6日 (一) 10:56的版本

Date	People	Last Week	This Week
2017/1/3	Yang Feng	ran experiments on the CS-EN data set (200k pairs) with totally identical initialization as the paper. on the sampled 2k training sentences, the bleu is 19.5 (not converged yet). (the bleu on the test set expected to be 26) add the alpha and gamma score and do multi-task training. Without multi-task training, the loss didn't decline on the training data, but with multi-task training, the loss did decline. prepared for Huilan's inspection.	Analyze the reason that the loss didn't decline with alpha and gamma test for multi-task training; improve the baseline for the CS-EN
	Jiyuan Zhang	reproduced planning neural network results	reproduce planning neural network
	Andi Zhang	added source masks in attention_decoder where calculates attention and in gru_cell where calculates new states. found the attribute sentence_length, perhaps it works better than my code
	Shiyue Zhang	figured out the problem of attention: the initial value of V should be around 0 tested different modification, such as add mask, init b with 0. Compared the results, and concluded only change the initial value of V is the best.	try to get right attention on memory
	Peilun Xiao

@@ 第4行： / 第4行： @@
 | rowspan="6"|2017/1/3
 |Yang Feng ||
+* ran experiments on the CS-EN data set (200k pairs) with totally identical initialization as the paper. on the sampled 2k training sentences, the bleu is 19.5 (not converged yet). (the bleu on the test set expected to be 26)
+* add the alpha and gamma score and do multi-task training. Without multi-task training, the loss didn't decline on the training data, but with multi-task training, the loss did decline.
+* prepared for Huilan's inspection.
 ||
+* Analyze the reason that the loss didn't decline with alpha and gamma
+* test for multi-task training;
+* improve the baseline for the CS-EN
 |-
 |Jiyuan Zhang ||