Date |
Person |
start |
leave |
hours |
status
|
2017/04/02
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/03
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/04
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/05
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/06
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/07
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/08
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/09
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/10
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/11
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/12
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/13
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/14
|
Andy Zhang |
9:30 |
18:30 |
8 |
|
Peilun Xiao |
|
|
|
|
2017/04/15
|
Andy Zhang |
9:00 |
15:00 |
6 |
|
Peilun Xiao |
|
|
|
|
2017/04/18
|
Aodong Li |
11:00 |
20:00 |
8 |
- Pick up new task in news generation and do literature review
|
2017/04/19
|
Aodong Li |
11:00 |
20:00 |
8 |
|
2017/04/20
|
Aodong Li |
12:00 |
20:00 |
8 |
|
2017/04/21
|
Aodong Li |
12:00 |
20:00 |
8 |
|
2017/04/24
|
Aodong Li |
11:00 |
20:00 |
8 |
- Adjust literature review focus
|
2017/04/25
|
Aodong Li |
11:00 |
20:00 |
8 |
|
2017/04/26
|
Aodong Li |
11:00 |
20:00 |
8 |
|
2017/04/27
|
Aodong Li |
11:00 |
20:00 |
8 |
- Try to reproduce sc-lstm work
|
2017/04/28
|
Aodong Li |
11:00 |
20:00 |
8 |
- Transfer to new task in machine translation and do literature review
|
2017/04/30
|
Aodong Li |
11:00 |
20:00 |
8 |
|
2017/05/01
|
Aodong Li |
11:00 |
20:00 |
8 |
|
2017/05/02
|
Aodong Li |
11:00 |
20:00 |
8 |
- Literature review and code review
|
2017/05/06
|
Aodong Li |
14:20 |
17:20 |
3 |
|
2017/05/07
|
Aodong Li |
13:30 |
22:00 |
8 |
- Code review and experiment started, but version discrepancy encountered
|
2017/05/08
|
Aodong Li |
11:30 |
21:00 |
8 |
- Code review and version discrepancy solved
|
2017/05/09
|
Aodong Li |
13:00 |
22:00 |
9 |
- Code review and experiment
- details about experiment:
small data,
1st and 2nd translator uses the same training data,
2nd translator uses random initialized embedding
BASELINE: 43.87
best result of our model: 42.56
|
2017/05/10
|
Shipan Ren |
9:00 |
20:00 |
11 |
- Entry procedures
- Machine Translation paper reading
|
2017/05/10
|
Aodong Li |
13:30 |
22:00 |
8 |
small data,
1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately
2nd translator uses random initialized embedding
BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to
prevent the case of overfitting, to generate the 2nd translator's training data, for
which the BLEU is 34.96)
best result of our model: 29.81
This may suggest that that using either the same training data with 1st translator or different
one won't influence 2nd translator's performance, instead, using the same one may
be better, at least from results. But I have to give a consideration of a smaller size
of training data compared to yesterday's model.
- code 2nd translator with constant embedding
|
2017/05/11
|
Shipan Ren |
10:00 |
19:30 |
9.5 |
- Configure environment
- Run tf_translate code
- Read Machine Translation paper
|
2017/05/11
|
Aodong Li |
13:00 |
21:00 |
8 |
small data,
1st and 2nd translator uses the same training data,
2nd translator uses constant untrainable embedding imported from 1st translator's decoder
BASELINE: 43.87
best result of our model: 43.48
Experiments show that this kind of series or cascade model will definitely impair the final perfor-
mance due to information loss as the information flows through the network from
end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate
this (9000+ -> 6000+).
The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,
but result of whether the map is learned or not is obscured by the smaller vocab size
phenomenon.
- literature review on hierarchical machine translation
|
2017/05/12
|
Aodong Li |
13:00 |
21:00 |
8 |
- Code double decoding model and read multilingual MT paper
|
2017/05/13
|
Shipan Ren |
10:00 |
19:00 |
9 |
- read machine translation paper
- learne lstm model and seq2seq model
|
2017/05/14
|
Aodong Li |
10:00 |
20:00 |
9 |
- Code double decoding model and experiment
- details about experiment:
small data,
2nd translator uses as training data the concat(Chinese, machine translated English),
2nd translator uses random initialized embedding
BASELINE: 43.87
best result of our model: 43.53
- NEXT: 2nd translator uses trained constant embedding
|
2017/05/15
|
Shipan Ren |
9:30 |
19:00 |
9.5 |
- understand the difference between lstm model and gru model
- read the implement code of seq2seq model
|
2017/05/17
|
Shipan Ren |
9:30 |
19:30 |
10 |
- read neural machine translation paper
- read tf_translate code
|
Aodong Li |
13:30 |
24:00 |
9 |
- code and debug double-decoder model
- alter 2017/05/14 model's size and will try after nips
|
2017/05/18
|
Shipan Ren |
10:00 |
19:00 |
9 |
- read neural machine translation paper
- read tf_translate code
|
Aodong Li |
12:30 |
21:00 |
8 |
- train double-decoder model on small data set but encounter decode bugs
|
2017/05/19
|
Aodong Li |
12:30 |
20:30 |
8 |
- debug double-decoder model
- the model performs well on develop set, but performs badly on test data. I want to figure out the reason.
|
2017/05/21
|
Aodong Li |
10:30 |
18:30 |
8 |
- details about experiment:
hidden_size = 700 (500 in prior)
emb_size = 510 (310 in prior)
small data,
2nd translator uses as training data the concat(Chinese, machine translated English),
2nd translator uses random initialized embedding
BASELINE: 43.87
best result of our model: 45.21
But only one checkpoint outperforms the baseline, the other results are commonly under 43.1
- debug double-decoder model
|
2017/05/22
|
Aodong Li |
14:00 |
22:00 |
8 |
- double-decoder without joint loss generalizes very bad
- i'm trying double-decoder model with joint loss
|
2017/05/23
|
Aodong Li |
13:00 |
21:30 |
8 |
- details about experiment 1:
hidden_size = 700
emb_size = 510
learning_rate = 0.0005 (0.001 in prior)
small data,
2nd translator uses as training data the concat(Chinese, machine translated English),
2nd translator uses random initialized embedding
BASELINE: 43.87
best result of our model: 42.19
Overfitting? In overall, the 2nd translator performs worse than baseline
- details about experiment 2:
hidden_size = 500
emb_size = 310
learning_rate = 0.001
small data,
double-decoder model with joint loss which means the final loss = 1st decoder's loss + 2nd
decoder's loss
BASELINE: 43.87
best result of our model: 39.04
The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that
the second decoder only learns from the first decoder's hidden states because their states are
almost the same.
The reason why double-decoder without joint loss generalizes very bad is that the gap between
force teaching mechanism (training process) and beam search mechanism (decoding process)
propagates and expands the error to the output end, which destroys the model when decoding.
Try to train double-decoder model without joint loss but with beam search on 1st decoder.
|
2017/05/24
|
Aodong Li |
13:00 |
21:30 |
8 |
- code double-attention one-decoder model
- code double-decoder model
|
2017/05/24
|
Shipan Ren |
10:00 |
20:00 |
10 |
- read neural machine translation paper
- read tf_translate code
|
2017/05/25
|
Shipan Ren |
9:30 |
18:30 |
9 |
- write document of tf_translate project
- read neural machine translation paper
- read tf_translate code
|
Aodong Li |
13:00 |
22:00 |
9 |
- code and debug double attention model
|
2017/05/28
|
Aodong Li |
15:00 |
22:00 |
7 |
- details about experiment:
hidden_size = 500
emb_size = 310
learning_rate = 0.001
small data,
2nd translator uses as training data both Chinese and machine translated English
Chinese and English use different encoders and different attention
final_attn = attn_1 + attn_2
2nd translator uses random initialized embedding
BASELINE: 43.87
when decoding:
final_attn = attn_1 + attn_2 best result of our model: 43.50
final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: 41.22
final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: 43.58
|
2017/05/30
|
Aodong Li |
15:00 |
21:00 |
6 |
- details about experiment 1:
hidden_size = 500
emb_size = 310
learning_rate = 0.001
small data,
2nd translator uses as training data both Chinese and machine translated English
Chinese and English use different encoders and different attention
final_attn = 2/3attn_1 + 4/3attn_2
2nd translator uses random initialized embedding
BASELINE: 43.87
best result of our model: 42.36
- details about experiment 2:
final_attn = 2/3attn_1 + 4/3attn_2
2nd translator uses constant initialized embedding
BASELINE: 43.87
best result of our model: 45.32
- details about experiment 3:
final_attn = attn_1 + attn_2
2nd translator uses constant initialized embedding
BASELINE: 43.87
best result of our model: 45.41 and it seems more stable
|