NLP Schedule

Members

Current Members

Yang Feng (冯洋)
Jiyuan Zhang （张记袁）
Aodong Li (李傲冬)
Andi Zhang (张安迪)
Shiyue Zhang (张诗悦)
Li Gu (古丽)
Peilun Xiao (肖培伦)
Shipan Ren (任师攀)

Former Members

Chao Xing (邢超) : FreeNeb
Rong Liu (刘荣) : 优酷
Xiaoxi Wang (王晓曦) : 图灵机器人
Xi Ma (马习) : 清华大学研究生
Tianyi Luo (骆天一) ： phd candidate in University of California Santa Cruz
Qixin Wang (王琪鑫) : MA candidate in University of California
DongXu Zhang (张东旭): --
Yiqiao Pan (潘一桥) ： MA candidate in University of Sydney
Shiyao Li （李诗瑶） : BUPT
Aiting Liu (刘艾婷) : BUPT

Work Progress

Daily Report

Date	Person	start	leave	hours	status
2017/04/02	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/02	Peilun Xiao
2017/04/03	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/03	Peilun Xiao
2017/04/04	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/04	Peilun Xiao
2017/04/05	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/05	Peilun Xiao
2017/04/06	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/06	Peilun Xiao
2017/04/07	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/07	Peilun Xiao
2017/04/08	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/08	Peilun Xiao
2017/04/09	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/09	Peilun Xiao
2017/04/10	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/10	Peilun Xiao
2017/04/11	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/11	Peilun Xiao
2017/04/12	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/12	Peilun Xiao
2017/04/13	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/13	Peilun Xiao
2017/04/14	Andy Zhang	9:30	18:30	8	preparing EMNLP
2017/04/14	Peilun Xiao
2017/04/15	Andy Zhang	9:00	15:00	6	preparing EMNLP
2017/04/15	Peilun Xiao
2017/04/18	Aodong Li	11:00	20:00	8	Pick up new task in news generation and do literature review
2017/04/19	Aodong Li	11:00	20:00	8	Literature review
2017/04/20	Aodong Li	12:00	20:00	8	Literature review
2017/04/21	Aodong Li	12:00	20:00	8	Literature review
2017/04/24	Aodong Li	11:00	20:00	8	Adjust literature review focus
2017/04/25	Aodong Li	11:00	20:00	8	Literature review
2017/04/26	Aodong Li	11:00	20:00	8	Literature review
2017/04/27	Aodong Li	11:00	20:00	8	Try to reproduce sc-lstm work
2017/04/28	Aodong Li	11:00	20:00	8	Transfer to new task in machine translation and do literature review
2017/04/30	Aodong Li	11:00	20:00	8	Literature review
2017/05/01	Aodong Li	11:00	20:00	8	Literature review
2017/05/02	Aodong Li	11:00	20:00	8	Literature review and code review
2017/05/06	Aodong Li	14:20	17:20	3	Code review
2017/05/07	Aodong Li	13:30	22:00	8	Code review and experiment started, but version discrepancy encountered
2017/05/08	Aodong Li	11:30	21:00	8	Code review and version discrepancy solved
2017/05/09	Aodong Li	13:00	22:00	9	Code review and experiment details about experiment: small data, 1st and 2nd translator uses the same training data, 2nd translator uses random initialized embedding results (BLEU): BASELINE: 43.87 best result of our model: 42.56
2017/05/10	Shipan Ren	9:00	20:00	11	Entry procedures Machine Translation paper reading
2017/05/10	Aodong Li	13:30	22:00	8	experiment setting: small data, 1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately 2nd translator uses random initialized embedding results (BLEU): BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to prevent the case of overfitting, to generate the 2nd translator's training data, for which the BLEU is 34.96) best result of our model: 29.81 This may suggest that that using either the same training data with 1st translator or different one won't influence 2nd translator's performance, instead, using the same one may be better, at least from results. But I have to give a consideration of a smaller size of training data compared to yesterday's model. code 2nd translator with constant embedding
2017/05/11	Shipan Ren	10:00	19:30	9.5	Configure environment Run tf_translate code Read Machine Translation paper
2017/05/11	Aodong Li	13:00	21:00	8	experiment setting: small data, 1st and 2nd translator uses the same training data, 2nd translator uses constant untrainable embedding imported from 1st translator's decoder results (BLEU): BASELINE: 43.87 best result of our model: 43.48 Experiments show that this kind of series or cascade model will definitely impair the final perfor- mance due to information loss as the information flows through the network from end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate this (9000+ -> 6000+). The intention of this experiment is looking for a map to solve meaning shift using 2nd translator, but result of whether the map is learned or not is obscured by the smaller vocab size phenomenon. literature review on hierarchical machine translation
2017/05/12	Aodong Li	13:00	21:00	8	Code double decoding model and read multilingual MT paper
2017/05/13	Shipan Ren	10:00	19:00	9	read machine translation paper learne lstm model and seq2seq model
2017/05/14	Aodong Li	10:00	20:00	9	Code double decoding model and experiment details about experiment: small data, 2nd translator uses as training data the concat(Chinese, machine translated English), 2nd translator uses random initialized embedding results (BLEU): BASELINE: 43.87 best result of our model: 43.53 NEXT: 2nd translator uses trained constant embedding
2017/05/15	Shipan Ren	9:30	19:00	9.5	understand the difference between lstm model and gru model read the implement code of seq2seq model
2017/05/17	Shipan Ren	9:30	19:30	10	read neural machine translation paper read tf_translate code
2017/05/17	Aodong Li	13:30	24:00	9	code and debug double-decoder model alter 2017/05/14 model's size and will try after nips
2017/05/18	Shipan Ren	10:00	19:00	9	read neural machine translation paper read tf_translate code
2017/05/18	Aodong Li	12:30	21:00	8	train double-decoder model on small data set but encounter decode bugs
2017/05/19	Aodong Li	12:30	20:30	8	debug double-decoder model the model performs well on develop set, but performs badly on test data. I want to figure out the reason.
2017/05/21	Aodong Li	10:30	18:30	8	details about experiment: hidden_size = 700 (500 in prior) emb_size = 510 (310 in prior) small data, 2nd translator uses as training data the concat(Chinese, machine translated English), 2nd translator uses random initialized embedding results (BLEU): BASELINE: 43.87 best result of our model: 45.21 But only one checkpoint outperforms the baseline, the other results are commonly under 43.1 debug double-decoder model
2017/05/22	Aodong Li	14:00	22:00	8	double-decoder without joint loss generalizes very bad i'm trying double-decoder model with joint loss
2017/05/23	Aodong Li	13:00	21:30	8	details about experiment 1: hidden_size = 700 emb_size = 510 learning_rate = 0.0005 (0.001 in prior) small data, 2nd translator uses as training data the concat(Chinese, machine translated English), 2nd translator uses random initialized embedding results (BLEU): BASELINE: 43.87 best result of our model: 42.19 Overfitting? In overall, the 2nd translator performs worse than baseline details about experiment 2: hidden_size = 500 emb_size = 310 learning_rate = 0.001 small data, double-decoder model with joint loss which means the final loss = 1st decoder's loss + 2nd decoder's loss results (BLEU): BASELINE: 43.87 best result of our model: 39.04 The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that the second decoder only learns from the first decoder's hidden states because their states are almost the same. DISCOVERY: The reason why double-decoder without joint loss generalizes very bad is that the gap between force teaching mechanism (training process) and beam search mechanism (decoding process) propagates and expands the error to the output end, which destroys the model when decoding. next: Try to train double-decoder model without joint loss but with beam search on 1st decoder.
2017/05/24	Aodong Li	13:00	21:30	8	code double-attention one-decoder model code double-decoder model