<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://index.cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://index.cslt.org/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Intern</id>
		<title>cslt Wiki - 用户贡献 [zh-cn]</title>
		<link rel="self" type="application/atom+xml" href="http://index.cslt.org/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Intern"/>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E7%89%B9%E6%AE%8A:%E7%94%A8%E6%88%B7%E8%B4%A1%E7%8C%AE/Intern"/>
		<updated>2026-04-07T04:47:22Z</updated>
		<subtitle>用户贡献</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Shallow_fusion.pdf</id>
		<title>文件:Shallow fusion.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Shallow_fusion.pdf"/>
				<updated>2017-08-01T07:32:21Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-31</id>
		<title>NLP Status Report 2017-7-31</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-31"/>
				<updated>2017-07-31T04:49:05Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/7/3&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
*made the poster for ACL&lt;br /&gt;
*attempted to fix repeated word, but failed&lt;br /&gt;
*done some work of n-gram model of the couplet&lt;br /&gt;
|| &lt;br /&gt;
*generate streame according to a couplet&lt;br /&gt;
*complete the task of filling in the blanks of a couplet&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Got 55,000+ Englsih poems and 260,000+ lines after preprocessing&lt;br /&gt;
* Added phase separators as the style indicator, and every line has at least one separator&lt;br /&gt;
* Training loss didn't decrease very much, only from 440 to 50&lt;br /&gt;
* The translation quality deteriorated when added language model&lt;br /&gt;
||&lt;br /&gt;
* Try to use a larger language model to decrease the training loss&lt;br /&gt;
* Try to use character-based MT in English-Chinese translation&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* trained two models of the baseline using WMT2014 en-fr datasets&lt;br /&gt;
  under training &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* read some papers（memory-augmented-nmt and Memory augmented Chinese-Uyghur Neural Machine Translation）   &lt;br /&gt;
||&lt;br /&gt;
* read memory-augmented-nmt code&lt;br /&gt;
* read papers about memory augmented NMT &lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
|Jiayu Guo||&lt;br /&gt;
*process document.&lt;br /&gt;
*Shiji has been split up to 2,5000 pairs of sentence.&lt;br /&gt;
*Zizhitongjian has been split up to 2,0000 pairs.&lt;br /&gt;
||&lt;br /&gt;
*adjust jieba source code&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-24</id>
		<title>NLP Status Report 2017-7-24</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-24"/>
				<updated>2017-07-26T05:01:23Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/7/3&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
*&lt;br /&gt;
|| &lt;br /&gt;
*make the poster for ACL&lt;br /&gt;
*complete neural model for the couplet&lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Completed the shallow fusion of news-domain translation with dialog-domain style.&lt;br /&gt;
* The style was not obvious since the dialog dataset has no specific style indicator.&lt;br /&gt;
* Some examples:&lt;br /&gt;
  全程 预计 00 天 , 团费 大约 0.0万 元 人民币 。&lt;br /&gt;
  w/ style: it is estimated that 00 days of the entire project will be about 00 million yuan .&lt;br /&gt;
  w/o style: the whole world is expected to be about 00 days , with a total of 00,000 yuan rmb .&lt;br /&gt;
&lt;br /&gt;
  在 美国 九一一 恐怖 攻击 周年 左右 , 东南亚 各 地 的 西方 外交 使节 团 纷纷 关闭 , 因为 &lt;br /&gt;
  它们 遭到 与 欧萨玛 . 宾 拉登 的 盖 达 组织 及 其 地方 联盟 有关 的 威胁 。&lt;br /&gt;
  w/ style: on the anniversary of the sept 0 terrorist attack , the western dpp diplomatic &lt;br /&gt;
  envoys in southeast asia were shut down because they were with the threat to al qaeda&lt;br /&gt;
  bin laden and al - qaeda 's relevant alliance .&lt;br /&gt;
  w/o style: on the anniversary of the sept 0 terrorist attack , the western dpp diplomatic envoys&lt;br /&gt;
  in southeast asia were shut off because they were closely connected with osama bin laden 's al &lt;br /&gt;
  qaeda and al - qaeda 's relevant alliances .&lt;br /&gt;
||&lt;br /&gt;
* Find the dataset with obvious style indicators.&lt;br /&gt;
* Try to quantify the result to determine if it is effective.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* trained two models of the baseline using WMT2014 en-fr datasets&lt;br /&gt;
  under training &lt;br /&gt;
  new version saved more time&lt;br /&gt;
&lt;br /&gt;
* read some papers（memory-augmented-nmt and Memory augmented Chinese-Uyghur Neural Machine Translation）   &lt;br /&gt;
||&lt;br /&gt;
* read memory-augmented-nmt code&lt;br /&gt;
* read papers about memory augmented NMT &lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Weekly_meeting</id>
		<title>Weekly meeting</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Weekly_meeting"/>
				<updated>2017-07-11T02:08:03Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;*Location: FIT-1-304&lt;br /&gt;
*Time: Monday, 7:00 PM&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Speaker!! Title !! Materials !! On duty&lt;br /&gt;
|-&lt;br /&gt;
| 2012/08/27  ||Dong Wang  || Heterogeneous Convolutive Non-negative Sparse Coding ||[[媒体文件:Heterogeneous_convolutive_non-negative_sparse_coding.pdf|slides]] [http://homepages.inf.ed.ac.uk/v1dwang2/public/pdf/inerspeech2012-hetero.pdf paper] ||&lt;br /&gt;
|-&lt;br /&gt;
|2012/09/03  ||NO Meeting|| || ||&lt;br /&gt;
|-&lt;br /&gt;
|2012/09/10  || NO Meeting|| || ||&lt;br /&gt;
|-&lt;br /&gt;
|2012/09/17  ||WALEED ABDULLA||Auditory Based Feature Vectors for Speech Recognition ||[[媒体文件:AuditoryBasedFeatureVectors.pdf|slides]]||范淼&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2012/09/24  ||刘超|| N-gram FST indexing for Spoken Term Detection || [[媒体文件:120924-N_gram_FST_indexing_for_Spoken_Term_Detection-LC-0.pdf|slides]] ||尹聪&lt;br /&gt;
|-&lt;br /&gt;
|范淼||Micro-blogging, Wikipedia, Folksonomy, What's Next? ||[[媒体文件:120924-Micro-blogging, Wikipedia, Folksonomy, What's Next-FM--01-FM-.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| 2012/10/08 ||NO Meeting|| || ||&lt;br /&gt;
|-&lt;br /&gt;
| 2012/10/15  ||NO Meeting|| || ||&lt;br /&gt;
|-&lt;br /&gt;
|2012/10/22||Wu Xiaojun||speaker recognition in CSLT ||[[媒体文件:VPR_in_CSLT.pdf|slides]]||卡尔&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/10/29  ||王军||An overview of Automatic Speaker Diarization Systems || [[媒体文件:121027-Speaker Diarization-WJ.pdf|slides]] ||别凡虎&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/11/05  ||别凡虎||Experiments on Emotional Speaker Recognition||[[媒体文件:121104-Experiments_on_Emotional_Speaker_Recognition-BFH.pdf|slides]] ||刘超&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/11/12  ||唐国瑜||Statistical Word Sense Improves Document Clustering ||[[媒体文件:121112_Statistical_Word_Sense_Improves_Document_Clustering_TGY.pdf‎ |slides]]||邱晗&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/11/19  ||张陈昊||TDSR with Long-term Features Based on Functional Data Analysis||[[媒体文件:121118-ISCSLP-FDA_SR-ZCH.pdf|slides]] ||王俊俊&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/11/26  ||王琳琳||Time-Varying Speaker Recognition: An Introduction||[[媒体文件:121126-Time_Varying_Speaker_Recognition_I-Wll.pdf‎|slides]] ||龚宬&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/12/03  ||No meeting|| || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/12/10  ||No meeting|| || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/12/17  ||No meeting|| || ||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/01/07  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
|2012/01/07  ||王军||基于DF-MAP的说话人模型训练方法||[[媒体文件:130107-基于DFMAP的说话人模型训练方法-WJ.pdf|slides]] ||唐国瑜&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2012/01/14  ||王东|| Computing in CSLT ||[[媒体文件:Computing_in_CSLT.pdf|slides]] ||王琳琳&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/03/04  ||王军||Sequential Adaptive Learning for Speaker Verification ||[[媒体文件:130301-Sequential adaptive learning for speaker verification-WJ.pdf|slides]] ||别凡虎&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/03/11  || Du Jinle|| VAD stuff || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/03/18  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/03/25  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/04/01  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/04/08  || 张陈昊|| A Fishervoice based Feature Fusion Method for SUSR ||[[媒体文件:130408-FisherVoice-ZCH.pdf|slides]] ||谢仲达&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/04/15  ||龚宬|| An Exploration on Influence Factors of VAD's Performance in Speaker Recognition ||[[媒体文件:130415-An_Exploration_on_Influence_Factors_of_VAD-GC.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/04/22  ||王俊俊 || Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task ||[[媒体文件:130422-Understanding_the_Query-WJJ.pdf|slides‎]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/04/29  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/05/06  ||别凡虎 ||MLLR on Emotional Speaker Recognition ||[[媒体文件:130506-MLLR on Emotional Speaker Recognition-BFH.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/05/13  ||刘超 || The Use of Deep Neural Network for Speech Recognition || [[媒体文件:130513-the_use_of_dnn_for_asr-lc.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/05/20  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/05/27  ||王琳琳|| 说话人识别中的时变鲁棒性问题研究 || [[媒体文件:130527-TVSV-Wll.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/06/03  ||王俊俊|| 汉语搜索结果聚类系统研究与实现 || [[媒体文件:130601-毕业答辩-02-WJJ.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/06/10  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/06/17  ||范淼 || Relation Extraction ||[[媒体文件:130617-relation_extraction-fm.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/06/24  ||唐国瑜 || Incorporating Statistical Word Senses in Topic Model  ||[[媒体文件:130624_Incorporating Statistical Word Senses in Topic Model_TGY.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/07/01  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/07/08  ||  || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/07/15  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/09/09  ||王东 || Research Frontier in Speech Technology||[[媒体文件:Research Frontier in Speech Technology.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/09/16  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/09/23  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/09/30  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/10/07  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/10/14  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/10/21  ||范淼 ||Transduction Classification with Matrix Completion （中文报告）||[[媒体文件: Transduction_Classifiction_with_Matrix_Completion.pdf‎|slides]] [http://pages.cs.wisc.edu/~jerryzhu/pub/mc4ssl_FINAL.pdf paper]|| 李蓝天&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/10/28  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/11/04  || 王军 || 基于i-vector的intersession补偿及打分方法(综述) || [[媒体文件:131104-ivecto下intersession补偿及打分方法--01-WJ-.pdf‎|slides]]||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/11/11  ||张陈昊 ||PLDA介绍及PLDA在说话人识别中的应用 ||[[媒体文件:PLDA.pdf|slides]] || 唐国瑜&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/11/18  ||别凡虎 ||i-vector理论介绍（讨论）||[[媒体文件:131118-i-vector_and_GMM-UBM-BFH.pdf|slides]]‎  ||王军&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/11/25  ||刘超 || Pruning Neural Networks By Optimal Brain Damage(综述)||[[媒体文件:131125-OBD-LC-01.pdf|slides]] ||范淼&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/12/02  ||范淼 ||Distant Supervision for Relation Extraction with Matrix Completion （英文报告）||[[媒体文件:131202-DRMC-FM-01.pdf|slides]] || 李蓝天&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/12/09  || Dong Wang|| Introduction to the HMM-based speech synthesis||[http://hts.sp.nitech.ac.jp/archives/2.2/HTS_Slides.zip slides] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/12/16  ||张陈昊 ||语音研究中的基元介绍 ||[[媒体文件:131215-Phonology-ZCH.pdf|slides]]  ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/12/23  || Dong Wang|| Introduction to the HMM-based speech synthesis (2)||[http://hts.sp.nitech.ac.jp/archives/2.2/HTS_Slides.zip slides] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/12/23  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2013/12/30  ||刘荣 || continuous space language model||[[媒体文件:Cslm-cslt.pdf|slides]]  ||刘超&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/01/06  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/01/13  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/01/20  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/02/24  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/03/03  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/03/10  ||范淼|| Distant Supervision for Information Extraction (英文报告)|| || 李蓝天&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/03/17  ||唐国瑜 || Topic Models Incorporating Statistical Word Senses || [[媒体文件:TMISWS_For_CICLing2014.pdf|slides]]||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/03/24  ||孟祥涛 || Noisy training for Deep Neural Networks|| ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/03/31  ||范淼|| Translating Embeddings for Modeling Multi-relational Data （中文报告） || [https://www.hds.utc.fr/everest/lib/exe/fetch.php?id=en%3Atranse&amp;amp;cache=cache&amp;amp;media=en:cr_paper_nips13.pdf paper]||李蓝天&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/04/07  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/04/14  || Wang Jun|| I-vector and PLDA in depth ||[[媒体文件:131104-ivector-microsoft-wj.pdf|slides]]  ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/04/21  || 邱晗||汉语事件句式规范化处理 ||[[媒体文件:140421-汉语事件句式规范化-QH.pdf‎|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/04/28  || 唐国瑜|| Some papers in　CICLing2014 ||[[媒体文件:Some_papers_in_CICling2014.pdf|slides]]  ||刘超&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/05/05  || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/05/12  || 卡尔|| paper introduction || [[媒体文件:Acoustic Factor Analysis.pdf|slides]] || 邱晗&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2014/05/19  || 邱晗|| 汉语事件句式CCG推导树重构 ||[[媒体文件:140519-CCG_reConstruction.pdf‎|slides]]‎|| 卡尔&lt;br /&gt;
|-&lt;br /&gt;
|Liu Chao|| master proposal: sparse and deep neural networks || [[媒体文件:140519-proposal-LC-01.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;| || Liu Chao|| 2nd master proposal: sparse and deep neural networks|| ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/06/16  || 别凡虎 || Truncated Wave based VPR and Some Recent Work || [[媒体文件:140614-Truncated_Speech_based_VPR.pdf‎|slides]]‎ || 别凡虎&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/06/23  || 别凡虎 || Block-wise training for I-vector || [[媒体文件:140623-Block-wise training for I-vector.pdf‎|slides]]‎ || 别凡虎&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;| 2014/07/07||王军 ||Discriminative Scoring for Speaker Recognition Based on I-vectors || [[媒体文件:140707-work_report.pdf|slides]]|| 王军&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;| 2014/09/01|| || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/09/09 ||别凡虎 ||Reseach on Truncated Wave based VPR||[[媒体文件:140909-Truncated Speech based VPR.pdf|slides]] || 别凡虎&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;| 2014/09/15|| || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/09/22  || Miao Fan|| Large-scale Entity Relation Extraction based on Low-dimensional Representations (中文报告，博士开题)&lt;br /&gt;
||[[媒体文件:基于低维表示的大规模实体关系挖掘技术.pdf‎|slides]] || Lan TianLi&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;| 2014/09/29 || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/10/13  || Miao Fan|| The Frontier of Knowledge Embedding （英文报告）|| [[媒体文件:The_Frontier_of_Knowledge_Embedding.pdf‎|slides]]|| Lan TianLi&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/10/20  || || || || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/10/27  || Li Yi || Phonemes, Features, and Syllables: Converting Onset and Rime Inventories to Consonants and Vowels||[[媒体文件:Lanzhou Phonemes, Features, and Syllables- fianl.pdf|paper]] [[媒体文件:Syllables and phonemes - 20141027.pdf|slides]]|| &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/11/3   || 米吉提|| Automatic Speech Recognition of Agglutinative Language based on Lexicon Optimization||[[媒体文件:Mijit-slides-清华大学-2014-11-3.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/11/10  || || || || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/11/17  ||Dong Wang || Highly restricted keyword spotting for Uyghur using sparse analysis|| [[媒体文件:Highly Restricted Keyword Selection Based on Sparse Analysis.pdf|slides]]|| &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/11/24  || || || || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/12/1  ||ZhongDa Xie ||Incorporating Fine-Grained Ontological Relations in Medical Document Ranking || [[媒体文件:Fine-grained_relations.pdf|slides]]|| Lantian Li &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/12/8  || || || || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/12/15  || 唐国瑜 || 跨语言话题分析关键技术研究 ||[[媒体文件:141205-答辩-TGY.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/12/22  || || || || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2014/12/29  || Askar || Language Mismatch in Speaker Recognition System||[[媒体文件:141229--askar.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/1/5  ||Lantian Li || Deep Neural Networks for Speaker Recognition || [[媒体文件:150104_Deep_Neural_Networks_for_Speaker_Recognition_LLT.pdf|slides]]|| &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/1/12  || || || || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/1/19  || Dong Wang || Machine Learning Paradigms for Speech Recognition||[[媒体文件:Machine Learning Paradigms for Speech Recognition.pdf|slides]]  [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6423821 paper] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/1/26  || Chen Guorong || Information Transmission and Distribution on Web ||[[媒体文件:An_introduction_of_complex_network1.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot; |2015/3/9 || Dong Wang || Joint Deep Learning || [[媒体文件:Joint Deep Learning.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/3/16  || Dongxu Zhang || Knowledge learning from text data and knowledge bases || [[媒体文件:Joint Deep Learning.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/4/13  || Xuewei Zhang || Lasso-based Reverberation Suppression In Automatic Speech Recognition || [[媒体文件:Lasso-based Reverberation Suppression In Automatic Speech Recognition.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/5/11  || Dong Wang ||ASR and SID Research Frontier ||[[媒体文件:ASR and SID Research Frontier.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/11/23  || Zhiyuan Tang|| CTC learning|| [[媒体文件:CTC.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/11/30  || Mengyuan Zhao|| CNN-based music removal|| [[媒体文件:Music Removal by Convolutional Denoising.pdf | slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/12/3  || Zhiyuan Tang|| Networks of Memory|| [[媒体文件:Memory_net.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/12/7  || Yiqiao Pan|| Document Classification with Spherical Word Vectors||[[媒体文件:Document Classification with Spherical Word Vectors.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/12/14  || Dong Wang || Transfer Learning for Speech and Language Processing ||[[媒体文件:Transfer_Learning_for_Speech_and_Language_Processing.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/12/21  || Qixin Wang || Attention for poem generation ||[[媒体文件:Ijcai 2016.pptx|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2015/12/28  || Lantian Li || Max-margin metric learning for speaker recognition || [[媒体文件:Max-margin-Metric-Learning.pdf|slides]]|| &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/1/4  || Zhiyong Zhang || Parallel training,MPE and natural gradient||[[媒体文件:20160104_张之勇_Large-scale Parallel Training in Speech Recognition.pdf|slides]]||  &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/1/18  || Dongxu Zhang || Memoryless Document Vector ||[[媒体文件:Memoryless_document_vector.pdf|slides]]|| &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/3/14  || Zhiyuan Tang|| Oral presentation for &amp;quot;vMF-SNE: Embedding for Spherical Data&amp;quot;|| [[媒体文件:embedding.pdf|slides]] ||  &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/3/28  || Tianyi Luo || Review for Neural QA || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/29/CSLT_Weekly_Report--20160328.pdf slides] ||  &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/4/11  || Rong Liu || Recommendation in Youku || [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:Cslt%E5%AE%9E%E9%AA%8C%E5%AE%A4%E4%BA%A4%E6%B5%81.pptx slides] ||  &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/5/09 || Miao Fan || Learning contextual embeddings of knowledge base with entity descriptions.|| [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/9/9c/Techreport_CSLT_2016_M.F..pdf slides]  || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/5/16 || Yang Wang || Research on conversation thread detection. || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/bb/%E6%B1%AA%E6%B4%8B-%E6%AF%95%E8%AE%BE-CSLT.pdf slides]  || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/5/20 || Yang Wang &amp;amp;  Maoning Wang || Research on portfolio selection. || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/89/%E6%B1%AA%E6%B4%8B-%E9%87%91%E8%9E%8D%E7%AC%AC%E4%B8%80%E6%AC%A1%E5%88%86%E4%BA%AB.pdf slides1]  [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/bb/%E6%B1%87%E6%8A%A5_%E8%B5%84%E4%BA%A7%E7%BB%84%E5%90%88%E4%B8%AD%E5%87%A0%E4%B8%AA%E8%AF%84%E4%BB%B7%E6%8C%87%E6%A0%87%E7%9A%84%E8%A7%A3%E9%87%8A.pdf slides2]|| &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/5/20  || Zhiyuan Tang || ICASSP 2016 summary || [[媒体文件:Note icassp16.pdf|slides]] ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/5/23 || Dong Wang || graphical model and neural model || [[媒体文件:Graphic Model and Neural Model.pdf|slides]] [[媒体文件:Generative-Pdf.rar|papers]]  || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/8/02 || Zhiyuan Tang || Visualizing, Measuring and Understanding Neural Networks: A Brief Survey|| [[媒体文件:Nn analysis.pdf|slides]] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/8/03 || Yang Wang || Neural networks and genetic programming for financial forecasting || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/7/79/GeneticNN.pdf slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/11/05 || Yang Wang || Reinforcement Learning Models and Simulations || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/ca/RRL_and_sim.pdf slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/11/08 || April Pu || SOFTWARE DEVELIPMENT METHODOLOGIES || [http://wangd.cslt.org/talks/pdf/april_software.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/11/12 || Yang Wang || Generative Adversarial Nets || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/c9/Generative_adversarial_network.pdf slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/11/22 || Zhiyuan Tang || INTERSPEECH 2016 summary || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/65/Interspeech16_review.pdf slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2016/11/30 || Dong Wang || Deep and sparse learning in speech and language: an overview || [http://wangd.cslt.org/talks/pdf/bics2016.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/2/17 || Yang Wang || Review understanding deep learning requires rethinking generalization || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/3b/Review_understanding_deep_learning_requires_rethinking_generalization.pdf slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/6/5 || Dong Wang || Deep speech factorization || [http://wangd.cslt.org/talks/pdf/Deep-Speech-Factorization.pdf slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/6/8 || Shiyue Zhang || Convolutional Sequence to Sequence Learning  || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f3/Conv_seq2seq.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/6/12 || Shiyue Zhang || Memory-augmented Neural Machine Translation || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/36/Memory-augmented_Neural_Machine_Translation_.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/6/21 || Shiyue Zhang || Attention Is All You Need  || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/68/Attention_is_all_you_need.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/6/26 || Jiyuan Zhang || Chinese poem generation using neural model  || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/5/50/Flexible_and_Creative_Chinese_Poetry_Generation_Using_Neural_Memory_.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/6/21 || Miao Zhang || Speaker recognition on cough,laugh and wei  || &lt;br /&gt;
[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f6/Zm_cough.pdf slides]  &lt;br /&gt;
||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/7/10 || Aodong Li || Enhanced Neural Machine Translation by Learning from Draft  || [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/ca/Learning_from_draft.pptx slides] || &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Learning_from_draft.pdf</id>
		<title>文件:Learning from draft.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Learning_from_draft.pdf"/>
				<updated>2017-07-11T02:05:47Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Learning_from_draft.pptx</id>
		<title>文件:Learning from draft.pptx</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Learning_from_draft.pptx"/>
				<updated>2017-07-11T02:05:16Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-10</id>
		<title>NLP Status Report 2017-7-10</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-10"/>
				<updated>2017-07-10T04:53:54Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/7/3&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
*reproduced the couplet model using moses&lt;br /&gt;
|| &lt;br /&gt;
*continue to modify the couplet&lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Tried a seq2seq with style code model but it didn't work.&lt;br /&gt;
* Coded attention-based seq2seq NMT in shallow fusion with a language model.&lt;br /&gt;
||&lt;br /&gt;
* Complete coding and have a try. &lt;br /&gt;
* Find more monolingual corpus and upgrade the model.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* read and run ViVi_NMT code &lt;br /&gt;
* read the API of tensorflow &lt;br /&gt;
* debugged ViVi_NMT and  upgraded code version to tensorflow1.0 &lt;br /&gt;
* found the new version saves more time，has lower complexity and better bleu than before &lt;br /&gt;
||&lt;br /&gt;
* test two versions of the code on small data sets (Chinese-English) and large data sets (Chinese-English) respectively&lt;br /&gt;
* test two versions of the code on WMT 2014 English-to-German parallel dataset and WMT 2014 English-French dataset respectively&lt;br /&gt;
* record experimental results&lt;br /&gt;
* read paper and try to make the bleu become a little better&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-10</id>
		<title>NLP Status Report 2017-7-10</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-10"/>
				<updated>2017-07-10T04:53:39Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/7/3&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
*reproduced the couplet model using moses&lt;br /&gt;
|| &lt;br /&gt;
*continue to modify the couplet&lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Tried a seq2seq with style code model but it didn't work.&lt;br /&gt;
* Coding attention-based seq2seq NMT in shallow fusion with a language model.&lt;br /&gt;
||&lt;br /&gt;
* Complete coding and have a try. &lt;br /&gt;
* Find more monolingual corpus and upgrade the model.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* read and run ViVi_NMT code &lt;br /&gt;
* read the API of tensorflow &lt;br /&gt;
* debugged ViVi_NMT and  upgraded code version to tensorflow1.0 &lt;br /&gt;
* found the new version saves more time，has lower complexity and better bleu than before &lt;br /&gt;
||&lt;br /&gt;
* test two versions of the code on small data sets (Chinese-English) and large data sets (Chinese-English) respectively&lt;br /&gt;
* test two versions of the code on WMT 2014 English-to-German parallel dataset and WMT 2014 English-French dataset respectively&lt;br /&gt;
* record experimental results&lt;br /&gt;
* read paper and try to make the bleu become a little better&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-3</id>
		<title>NLP Status Report 2017-7-3</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-7-3"/>
				<updated>2017-07-03T04:07:44Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/7/3&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Tried seq2seq with or without attention model to do style transfer (cross domain) task but this didn't work due to overfitting&lt;br /&gt;
  seq2seq with attention model: Chinese-to-English&lt;br /&gt;
  vanilla seq2seq model: English-to-English (Unsupervised)&lt;br /&gt;
* Read two style controlled papers in generative model field&lt;br /&gt;
* Trained seq2seq with style code model&lt;br /&gt;
||&lt;br /&gt;
* Understand the model and mechanism mentioned in the two related papers&lt;br /&gt;
* Figure out new ways to do style transfer task&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* read and run ViVi_NMT code &lt;br /&gt;
* read the API of tensorflow &lt;br /&gt;
* debugged ViVi_NMT and  upgraded code version to tensorflow1.0 &lt;br /&gt;
* found the new version saves more time，has lower complexity and better bleu than before &lt;br /&gt;
||&lt;br /&gt;
* test two versions of the code on small data sets (Chinese-English) and large data sets (Chinese-English) respectively&lt;br /&gt;
* test two versions of the code on WMT 2014 English-to-German parallel dataset and WMT 2014 English-French dataset respectively&lt;br /&gt;
* record experimental results&lt;br /&gt;
* read paper and try to make the bleu become a little better&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-07-02T13:50:03Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/27&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/28&lt;br /&gt;
|Aodong Li || 15:00 || 22:00 || 7 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  when decoding:&lt;br /&gt;
    final_attn = attn_1 + attn_2 best result of our model: '''43.50'''&lt;br /&gt;
    final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: '''41.22'''&lt;br /&gt;
    final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: '''43.58'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/30&lt;br /&gt;
|Aodong Li || 15:00 || 21:00 || 6 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.36'''&lt;br /&gt;
* details about experiment 2: &lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.32'''&lt;br /&gt;
* details about experiment 3: &lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.41''' and it seems more stable&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/31&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*run and test tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:00 || 20:30 || 8.5 || &lt;br /&gt;
* details about experiment 1: &lt;br /&gt;
  '''final_attn = 4/3attn_1 + 2/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.79'''&lt;br /&gt;
* That only make English word embedding at encoder constant and train all the other embedding and parameters achieves an even higher bleu score 45.98 and the results are stable.&lt;br /&gt;
* The quality of English embedding at encoder plays an pivotal role in this model.&lt;br /&gt;
* Preparation of big data. &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/01&lt;br /&gt;
|Aodong Li || 13:00 || 24:00 || 11 || &lt;br /&gt;
* Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
* Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
* Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* Run double-attention model on large data&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/02&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* Baseline bleu on large data is 30.83 with '''30000''' output vocab&lt;br /&gt;
* Our best result is 31.53 with '''20000''' output vocab&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/03&lt;br /&gt;
|Aodong Li || 13:00 || 21:00 || 8 || &lt;br /&gt;
* Train the model with 40 batch size and with concat(attn_1, attn_2)&lt;br /&gt;
* the best result of model with 40 batch size and with add(attn_1, attn_2) is 30.52&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/05&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/06&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/07&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/08&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/09&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/12&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/13&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/14&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/15&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
* Read paper about MT involving grammar&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/16&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Prepare for APSIPA paper&lt;br /&gt;
* Read paper about MT involving grammar&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/19&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Completed APSIPA paper&lt;br /&gt;
* Took new task in style translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/20&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Tried synonyms substitution&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/21&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Tried post edit like synonyms substitution but this didn't work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/22&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Trained a GRU language model to determine similar word&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/06/23&lt;br /&gt;
|Shipan Ren || 10:00 || 21:00 || 11 || &lt;br /&gt;
* read neural machine translation paper &lt;br /&gt;
* read and run tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Trained a GRU language model to determine similar word&lt;br /&gt;
* This didn't work because semantics is not captured&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/06/26&lt;br /&gt;
|Shipan Ren || 10:00 || 21:00 || 11 || &lt;br /&gt;
* read paper：LSTM Neural Networks for Language Modeling&lt;br /&gt;
* read and run ViVi_NMT code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Tried to figure out new ways to change the text style&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/06/27&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
* read the API of tensorflow&lt;br /&gt;
* debugged ViVi_NMT and tried to upgrade code version to tensorflow1.0&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Trained seq2seq model to solve this problem&lt;br /&gt;
* Semantics are stored in fixed-length vectors by a encoder and a decoder generate sequences on this vector&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/06/28&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* debugged ViVi_NMT and tried to upgrade code version to tensorflow1.0 (on server)&lt;br /&gt;
* installed tensorflow0.1 and tensorflow1.0 on my pc and debugged ViVi_NMT&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Cross-domain seq2seq w/o attention and w/ attention models didn't work because of overfitting&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/06/29&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
* read the API of tensorflow&lt;br /&gt;
* debugged ViVi_NMT and tried to upgrade code version to tensorflow1.0 (on server)&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Read style transfer papers&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/06/30&lt;br /&gt;
|Shipan Ren || 10:00 || 24:00 || 14 || &lt;br /&gt;
* debugged ViVi_NMT and tried to upgrade code version to tensorflow1.0 (on server)&lt;br /&gt;
* accomplished this task &lt;br /&gt;
* found the new version saves more time，has lower complexity and better bleu than before&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 10:00 || 19:00 || 8 || &lt;br /&gt;
* Read style transfer papers&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-26</id>
		<title>NLP Status Report 2017-6-26</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-26"/>
				<updated>2017-06-26T05:02:05Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/26&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
*GRE style-based translation:&lt;br /&gt;
  use direct replacement to do post edit&lt;br /&gt;
  use RNNLM to distinguish similar words and then do replacement&lt;br /&gt;
*All these methods seem to fall short of part of speech and semantics&lt;br /&gt;
||&lt;br /&gt;
* figure out new ways to distinguish similar word pairs with consideration of semantics&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-26</id>
		<title>NLP Status Report 2017-6-26</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-26"/>
				<updated>2017-06-26T04:58:15Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/26&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
*GRE style-based translation:&lt;br /&gt;
  use direct replacement to do post edit&lt;br /&gt;
  use RNNLM to distinguish similar words and then do replacement&lt;br /&gt;
*All these methods seem to fall short of part of speech and semantics&lt;br /&gt;
||&lt;br /&gt;
* figure out new ways to distinguish similar word pairs with considering semantics&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-26</id>
		<title>NLP Status Report 2017-6-26</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-26"/>
				<updated>2017-06-26T04:48:59Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：以“{| class=&amp;quot;wikitable&amp;quot; !Date !! People !! Last Week !! This Week |- | rowspan=&amp;quot;6&amp;quot;|2017/6/26 |Jiyuan Zhang || ||  |- |Aodong LI ||  ||  |- |Shiyue Zhang ||   ||  |- |Sh...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/26&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2017-6-26</id>
		<title>2017-6-26</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2017-6-26"/>
				<updated>2017-06-26T04:31:24Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：以“NLP Status Report 2017-6-26  ASR Status Report 2017-6-26  FIN Status Report 2017-6-26”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[NLP Status Report 2017-6-26]]&lt;br /&gt;
&lt;br /&gt;
[[ASR Status Report 2017-6-26]]&lt;br /&gt;
&lt;br /&gt;
[[FIN Status Report 2017-6-26]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Status_report</id>
		<title>Status report</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Status_report"/>
				<updated>2017-06-26T04:28:58Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[2017-6-26]]&lt;br /&gt;
&lt;br /&gt;
[[2017-6-19]]&lt;br /&gt;
&lt;br /&gt;
[[2017-6-12]]&lt;br /&gt;
&lt;br /&gt;
[[2017-6-5]]&lt;br /&gt;
&lt;br /&gt;
[[2017-5-31]]&lt;br /&gt;
&lt;br /&gt;
[[2017-5-22]]&lt;br /&gt;
&lt;br /&gt;
[[2017-5-15]]&lt;br /&gt;
&lt;br /&gt;
[[2017-5-8]]&lt;br /&gt;
&lt;br /&gt;
[[2017-5-2]]&lt;br /&gt;
&lt;br /&gt;
[[2017-4-24]]&lt;br /&gt;
&lt;br /&gt;
[[2017-4-17]]&lt;br /&gt;
&lt;br /&gt;
[[2017-4-10]]&lt;br /&gt;
&lt;br /&gt;
[[2017-4-5]]&lt;br /&gt;
&lt;br /&gt;
[[2017-3-27]]&lt;br /&gt;
&lt;br /&gt;
[[2017-3-20]]&lt;br /&gt;
&lt;br /&gt;
[[2017-3-13]]&lt;br /&gt;
&lt;br /&gt;
[[2017-3-6]]&lt;br /&gt;
&lt;br /&gt;
[[2017-2-27]]&lt;br /&gt;
&lt;br /&gt;
[[2017-2-20]]&lt;br /&gt;
&lt;br /&gt;
[[2017-2-13]]&lt;br /&gt;
&lt;br /&gt;
[[2017-2-6]]&lt;br /&gt;
&lt;br /&gt;
[[2017-1-30]]&lt;br /&gt;
&lt;br /&gt;
[[2017-1-23]]&lt;br /&gt;
&lt;br /&gt;
[[2017-1-16]]&lt;br /&gt;
&lt;br /&gt;
[[2017-1-10]]&lt;br /&gt;
&lt;br /&gt;
[[2017-1-3]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-19</id>
		<title>NLP Status Report 2017-6-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-19"/>
				<updated>2017-06-19T04:46:22Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/19&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* finished the paper of AP17&lt;br /&gt;
||&lt;br /&gt;
* Start a new task&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* finished the paper of AP17&lt;br /&gt;
||&lt;br /&gt;
* share Google's new paper&lt;br /&gt;
* deliver the NMT baseline code &lt;br /&gt;
* deliver the M-NMT code &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-19</id>
		<title>NLP Status Report 2017-6-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-19"/>
				<updated>2017-06-19T04:44:48Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/19&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* finished the paper of AP17&lt;br /&gt;
||&lt;br /&gt;
* Grammar-based translation&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* finished the paper of AP17&lt;br /&gt;
||&lt;br /&gt;
* share Google's new paper&lt;br /&gt;
* deliver the NMT baseline code &lt;br /&gt;
* deliver the M-NMT code &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-06-05T05:53:12Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/27&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/28&lt;br /&gt;
|Aodong Li || 15:00 || 22:00 || 7 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  when decoding:&lt;br /&gt;
    final_attn = attn_1 + attn_2 best result of our model: '''43.50'''&lt;br /&gt;
    final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: '''41.22'''&lt;br /&gt;
    final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: '''43.58'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/30&lt;br /&gt;
|Aodong Li || 15:00 || 21:00 || 6 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.36'''&lt;br /&gt;
* details about experiment 2: &lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.32'''&lt;br /&gt;
* details about experiment 3: &lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.41''' and it seems more stable&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/31&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*run and test tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:00 || 20:30 || 8.5 || &lt;br /&gt;
* details about experiment 1: &lt;br /&gt;
  '''final_attn = 4/3attn_1 + 2/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.79'''&lt;br /&gt;
* That only make English word embedding at encoder constant and train all the other embedding and parameters achieves an even higher bleu score 45.98 and the results are stable.&lt;br /&gt;
* The quality of English embedding at encoder plays an pivotal role in this model.&lt;br /&gt;
* Preparation of big data. &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/01&lt;br /&gt;
|Aodong Li || 13:00 || 24:00 || 11 || &lt;br /&gt;
* Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
* Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
* Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* Run double-attention model on large data&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/02&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* Baseline bleu on large data is 30.83 with '''30000''' output vocab&lt;br /&gt;
* Our best result is 31.53 with '''20000''' output vocab&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/03&lt;br /&gt;
|Aodong Li || 13:00 || 21:00 || 8 || &lt;br /&gt;
* Train the model with 40 batch size and with concat(attn_1, attn_2)&lt;br /&gt;
* the best result of model with 40 batch size and with add(attn_1, attn_2) is 30.52&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5</id>
		<title>NLP Status Report 2017-6-5</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5"/>
				<updated>2017-06-05T05:45:52Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/5&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Small data:&lt;br /&gt;
  Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
  Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
  Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* big data baseline bleu = '''30.83'''&lt;br /&gt;
* Model with three fixed embeddings&lt;br /&gt;
  Shrink output vocab from 30000 to 20000 and best result is 31.53&lt;br /&gt;
  Train the model with 40 batch size and best result until now is 30.63&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
* test more checkpoints on model trained with batch = 40&lt;br /&gt;
* train model with reverse output&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5</id>
		<title>NLP Status Report 2017-6-5</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5"/>
				<updated>2017-06-05T05:45:08Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/5&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Small data:&lt;br /&gt;
  Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
  Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
  Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* big data baseline bleu = '''30.83'''&lt;br /&gt;
* Fixed three embeddings&lt;br /&gt;
  Shrink output vocab from 30000 to 20000 and best result is 31.53&lt;br /&gt;
  Train the model with 40 batch size and best result until now is 30.63&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
* test more checkpoints on model trained with batch = 40&lt;br /&gt;
* train model with reverse output&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5</id>
		<title>NLP Status Report 2017-6-5</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5"/>
				<updated>2017-06-05T05:44:46Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/6/5&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Small data:&lt;br /&gt;
  Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
  Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
  Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* big data baseline bleu = '''30.83'''&lt;br /&gt;
* Fixed three embeddings&lt;br /&gt;
  Shrink output vocab from 30000 to 20000 and best result is 31.53&lt;br /&gt;
  Train the model with 40 batch size and best result until now is 30.63&lt;br /&gt;
|}&lt;br /&gt;
* test more checkpoints on model trained with batch = 40&lt;br /&gt;
* train model with reverse output&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5</id>
		<title>NLP Status Report 2017-6-5</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5"/>
				<updated>2017-06-05T05:35:37Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/31&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Small data:&lt;br /&gt;
  Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
  Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
  Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* big data baseline bleu = '''30.83'''&lt;br /&gt;
* Fixed three embeddings&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| 43.50&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| 43.58 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 41.22 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.36 (w/ retrained)&lt;br /&gt;
|}&lt;br /&gt;
* experiments with '''constant''' initialized embedding: &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5</id>
		<title>NLP Status Report 2017-6-5</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-6-5"/>
				<updated>2017-06-05T05:34:54Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：以“{| class=&amp;quot;wikitable&amp;quot; !Date !! People !! Last Week !! This Week |- | rowspan=&amp;quot;6&amp;quot;|2017/5/31 |Jiyuan Zhang || ||  |- |Aodong LI || * Small data:   Only make the English...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/31&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* Small data:&lt;br /&gt;
  Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
  Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
  Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* big data baseline bleu = '''30.83'''&lt;br /&gt;
* Fixed three embeddings&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| 43.50&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| 43.58 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 41.22 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.36 (w/ retrained)&lt;br /&gt;
|}&lt;br /&gt;
* experiments with '''constant''' initialized embedding: &lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-06-04T07:37:25Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/27&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/28&lt;br /&gt;
|Aodong Li || 15:00 || 22:00 || 7 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  when decoding:&lt;br /&gt;
    final_attn = attn_1 + attn_2 best result of our model: '''43.50'''&lt;br /&gt;
    final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: '''41.22'''&lt;br /&gt;
    final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: '''43.58'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/30&lt;br /&gt;
|Aodong Li || 15:00 || 21:00 || 6 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.36'''&lt;br /&gt;
* details about experiment 2: &lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.32'''&lt;br /&gt;
* details about experiment 3: &lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.41''' and it seems more stable&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/31&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*run and test tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:00 || 20:30 || 8.5 || &lt;br /&gt;
* details about experiment 1: &lt;br /&gt;
  '''final_attn = 4/3attn_1 + 2/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.79'''&lt;br /&gt;
* That only make English word embedding at encoder constant and train all the other embedding and parameters achieves an even higher bleu score 45.98 and the results are stable.&lt;br /&gt;
* The quality of English embedding at encoder plays an pivotal role in this model.&lt;br /&gt;
* Preparation of big data. &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/01&lt;br /&gt;
|Aodong Li || 13:00 || 24:00 || 11 || &lt;br /&gt;
* Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
* Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
* Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* Run double-attention model on large data&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/02&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* Baseline bleu on large data is 30.83 with '''30000''' output vocab&lt;br /&gt;
* Our best result is 31.53 with '''20000''' output vocab&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-06-02T07:57:23Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/27&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/28&lt;br /&gt;
|Aodong Li || 15:00 || 22:00 || 7 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  when decoding:&lt;br /&gt;
    final_attn = attn_1 + attn_2 best result of our model: '''43.50'''&lt;br /&gt;
    final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: '''41.22'''&lt;br /&gt;
    final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: '''43.58'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/30&lt;br /&gt;
|Aodong Li || 15:00 || 21:00 || 6 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.36'''&lt;br /&gt;
* details about experiment 2: &lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.32'''&lt;br /&gt;
* details about experiment 3: &lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.41''' and it seems more stable&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/31&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*run and test tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:00 || 20:30 || 8.5 || &lt;br /&gt;
* details about experiment 1: &lt;br /&gt;
  '''final_attn = 4/3attn_1 + 2/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.79'''&lt;br /&gt;
* That only make English word embedding at encoder constant and train all the other embedding and parameters achieves an even higher bleu score 45.98 and the results are stable.&lt;br /&gt;
* The quality of English embedding at encoder plays an pivotal role in this model.&lt;br /&gt;
* Preparation of big data. &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/06/01&lt;br /&gt;
|Aodong Li || 13:00 || 24:00 || 11 || &lt;br /&gt;
* Only make the English encoder's embedding constant -- 45.98&lt;br /&gt;
* Only initialize the English encoder's embedding and then finetune it -- 46.06&lt;br /&gt;
* Share the attention mechanism and then directly add them -- 46.20&lt;br /&gt;
* Run double-attention model on large data&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-31T11:20:24Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/27&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/28&lt;br /&gt;
|Aodong Li || 15:00 || 22:00 || 7 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  when decoding:&lt;br /&gt;
    final_attn = attn_1 + attn_2 best result of our model: '''43.50'''&lt;br /&gt;
    final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: '''41.22'''&lt;br /&gt;
    final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: '''43.58'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/30&lt;br /&gt;
|Aodong Li || 15:00 || 21:00 || 6 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.36'''&lt;br /&gt;
* details about experiment 2: &lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.32'''&lt;br /&gt;
* details about experiment 3: &lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.41''' and it seems more stable&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/31&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*run and test tf_translate code &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:00 || 20:30 || 8.5 || &lt;br /&gt;
* details about experiment 1: &lt;br /&gt;
  '''final_attn = 4/3attn_1 + 2/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.79'''&lt;br /&gt;
* That only make English word embedding at encoder constant and train all the other embedding and parameters achieves an even higher bleu score 45.98 and the results are stable.&lt;br /&gt;
* The quality of English embedding at encoder plays an pivotal role in this model.&lt;br /&gt;
* Preparation of big data. &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-31</id>
		<title>NLP Status Report 2017-5-31</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-31"/>
				<updated>2017-05-31T04:51:29Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/31&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* code double-attention model with '''final_attn = alpha * attn_ch + beta * attn_en'''&lt;br /&gt;
* baseline bleu = '''43.87'''&lt;br /&gt;
* experiments with '''random''' initialized embedding: &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| 43.50&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| 43.58 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 41.22 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.36 (w/ retrained)&lt;br /&gt;
|}&lt;br /&gt;
* experiments with '''constant''' initialized embedding: &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| '''45.41'''&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| '''45.79'''&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| '''45.32'''&lt;br /&gt;
|}&lt;br /&gt;
* 1.4~1.9 BLEU score improvement&lt;br /&gt;
* This model is similar to multi-source neural translation but uses less resource&lt;br /&gt;
||&lt;br /&gt;
* Test the model on big data&lt;br /&gt;
* Explore different attention merge strategies&lt;br /&gt;
* Explore hierarchical model&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* found dropout bug, fix it, and reran baseline: baseline 35.21, baseline(outproj=emb) 35.24 &lt;br /&gt;
* tried several embed set models, failed&lt;br /&gt;
* embedded other words to model embedding space (trained on train data not big data), and then directly used in baseline(outproj=emb) &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! 30000&lt;br /&gt;
! 50000&lt;br /&gt;
! 70000&lt;br /&gt;
! 90000&lt;br /&gt;
|-&lt;br /&gt;
| 35.24&lt;br /&gt;
| 34.52&lt;br /&gt;
| 33.73&lt;br /&gt;
| 33.16&lt;br /&gt;
|-&lt;br /&gt;
| 4564 (6666)&lt;br /&gt;
| 4535&lt;br /&gt;
| 4469&lt;br /&gt;
| 4426&lt;br /&gt;
|}&lt;br /&gt;
* m-nmt is running&lt;br /&gt;
||&lt;br /&gt;
* get word2vec on big data, and compare with word2vec from train data&lt;br /&gt;
* test m-nmt model, increase vocab size and test&lt;br /&gt;
* review zh-uy/uy-zh related works, start to write paper&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-31</id>
		<title>NLP Status Report 2017-5-31</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-31"/>
				<updated>2017-05-31T04:50:03Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/31&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* code double-attention model with '''final_attn = alpha * attn_ch + beta * attn_en'''&lt;br /&gt;
* baseline bleu = '''43.87'''&lt;br /&gt;
* experiments with '''random''' initialized embedding: &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| 43.50&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| 43.58 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.22 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.36 (w/ retrained)&lt;br /&gt;
|}&lt;br /&gt;
* experiments with '''constant''' initialized embedding: &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| '''45.41'''&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| '''45.79'''&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| '''45.32'''&lt;br /&gt;
|}&lt;br /&gt;
* 1.4~1.9 BLEU score improvement&lt;br /&gt;
* This model is similar to multi-source neural translation but uses less resource&lt;br /&gt;
||&lt;br /&gt;
* Test the model on big data&lt;br /&gt;
* Explore different attention merge strategies&lt;br /&gt;
* Explore hierarchical model&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* found dropout bug, fix it, and reran baseline: baseline 35.21, baseline(outproj=emb) 35.24 &lt;br /&gt;
* tried several embed set models, failed&lt;br /&gt;
* embedded other words to model embedding space (trained on train data not big data), and then directly used in baseline(outproj=emb) &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! 30000&lt;br /&gt;
! 50000&lt;br /&gt;
! 70000&lt;br /&gt;
! 90000&lt;br /&gt;
|-&lt;br /&gt;
| 35.24&lt;br /&gt;
| 34.52&lt;br /&gt;
| 33.73&lt;br /&gt;
| 33.16&lt;br /&gt;
|-&lt;br /&gt;
| 4564 (6666)&lt;br /&gt;
| 4535&lt;br /&gt;
| 4469&lt;br /&gt;
| 4426&lt;br /&gt;
|}&lt;br /&gt;
* m-nmt is running&lt;br /&gt;
||&lt;br /&gt;
* get word2vec on big data, and compare with word2vec from train data&lt;br /&gt;
* test m-nmt model, increase vocab size and test&lt;br /&gt;
* review zh-uy/uy-zh related works, start to write paper&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-31</id>
		<title>NLP Status Report 2017-5-31</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-31"/>
				<updated>2017-05-31T04:42:01Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/31&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* code double-attention model with '''final_attn = alpha * attn_ch + beta * attn_en'''&lt;br /&gt;
* baseline bleu = '''43.87'''&lt;br /&gt;
* experiments with '''random''' initialized embedding: &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| 43.50&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| 43.58 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.22 (w/o retrained)&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| 42.36 (w/ retrained)&lt;br /&gt;
|}&lt;br /&gt;
* experiments with '''constant''' initialized embedding: &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! alpha&lt;br /&gt;
! beta&lt;br /&gt;
! result (bleu)&lt;br /&gt;
|-&lt;br /&gt;
| 1&lt;br /&gt;
| 1&lt;br /&gt;
| '''45.41'''&lt;br /&gt;
|-&lt;br /&gt;
| 4/3&lt;br /&gt;
| 2/3&lt;br /&gt;
| '''45.79'''&lt;br /&gt;
|-&lt;br /&gt;
| 2/3&lt;br /&gt;
| 4/3&lt;br /&gt;
| '''45.32'''&lt;br /&gt;
|}&lt;br /&gt;
* This model is similar to multi-source neural translation but uses less resource&lt;br /&gt;
||&lt;br /&gt;
* Explore different attention merge strategies&lt;br /&gt;
* Explore hierarchical model&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* found dropout bug, fix it, and reran baseline: baseline 35.21, baseline(outproj=emb) 35.24 &lt;br /&gt;
* tried several embed set models, failed&lt;br /&gt;
* embedded other words to model embedding space (trained on train data not big data), and then directly used in baseline(outproj=emb) &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! 30000&lt;br /&gt;
! 50000&lt;br /&gt;
! 70000&lt;br /&gt;
! 90000&lt;br /&gt;
|-&lt;br /&gt;
| 35.24&lt;br /&gt;
| 34.52&lt;br /&gt;
| 33.73&lt;br /&gt;
| 33.16&lt;br /&gt;
|-&lt;br /&gt;
| 4564 (6666)&lt;br /&gt;
| 4535&lt;br /&gt;
| 4469&lt;br /&gt;
| 4426&lt;br /&gt;
|}&lt;br /&gt;
* m-nmt is running&lt;br /&gt;
||&lt;br /&gt;
* get word2vec on big data, and compare with word2vec from train data&lt;br /&gt;
* test m-nmt model, increase vocab size and test&lt;br /&gt;
* review zh-uy/uy-zh related works, start to write paper&lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
&lt;br /&gt;
||&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-30T11:55:56Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/28&lt;br /&gt;
|Aodong Li || 15:00 || 22:00 || 7 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  when decoding:&lt;br /&gt;
    final_attn = attn_1 + attn_2 best result of our model: '''43.50'''&lt;br /&gt;
    final_attn = 2/3attn_1 + 4/3attn_2 best result of our model: '''41.22'''&lt;br /&gt;
    final_attn = 4/3attn_1 + 2/3attn_2 best result of our model: '''43.58'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/30&lt;br /&gt;
|Aodong Li || 15:00 || 21:00 || 6 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data both Chinese and machine translated English&lt;br /&gt;
  Chinese and English use different encoders and different attention&lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.36'''&lt;br /&gt;
* details about experiment 2: &lt;br /&gt;
  '''final_attn = 2/3attn_1 + 4/3attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.32'''&lt;br /&gt;
* details about experiment 3: &lt;br /&gt;
  '''final_attn = attn_1 + attn_2'''&lt;br /&gt;
  2nd translator uses '''constant initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.41''' and it seems more stable&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-28T06:47:21Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Shipan Ren || 10:00 || 20:00 || 10 || &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/25&lt;br /&gt;
|Shipan Ren || 9:30 || 18:30 || 9 || &lt;br /&gt;
*write document of tf_translate project &lt;br /&gt;
*read neural machine translation paper &lt;br /&gt;
*read tf_translate code &lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 9 || &lt;br /&gt;
* code and debug double attention model&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-24T12:32:07Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/24&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*code double-attention one-decoder model&lt;br /&gt;
*code double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22</id>
		<title>NLP Status Report 2017-5-22</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22"/>
				<updated>2017-05-24T06:46:47Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/22&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* bleu of baseline = 43.87&lt;br /&gt;
* 2nd translator uses as training data the concat(Chinese, machine translated English):&lt;br /&gt;
  hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)&lt;br /&gt;
* double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):&lt;br /&gt;
  bleu = 40.11 (best)&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output.&lt;br /&gt;
* The training process of double-decoder model '''without''' joint loss is problematic.&lt;br /&gt;
||&lt;br /&gt;
* Replace the forced teaching mechanism in training process with beam search mechanism.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* tried to not train embedding but use external word vectors&lt;br /&gt;
* most results of my attempts are bad, only 3-layer rnn + no dropout model got 25.54 bleu which about 2 points worse than original baseline&lt;br /&gt;
* trained original baseline on new data ( the data fixed the reverse sentence problem), got bleu=27.88; moses bleu=32.47&lt;br /&gt;
||&lt;br /&gt;
* try more models to get similar results as original baseline on new data&lt;br /&gt;
* m-nmt model on new data &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* learn the implement of seq2seq model&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
||&lt;br /&gt;
* understand the meaning of main code&lt;br /&gt;
* start writing documents&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22</id>
		<title>NLP Status Report 2017-5-22</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22"/>
				<updated>2017-05-24T06:15:03Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/22&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* bleu of baseline = 43.87&lt;br /&gt;
* 2nd translator uses as training data the concat(Chinese, machine translated English):&lt;br /&gt;
  hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)&lt;br /&gt;
* double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):&lt;br /&gt;
  bleu = 40.11 (best)&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output.&lt;br /&gt;
* The training process of double-decoder model '''without''' joint loss is problematic.&lt;br /&gt;
||&lt;br /&gt;
* Overfitting? Train 2nd translator on large data&lt;br /&gt;
* Replace the forced teaching mechanism in training process with beam search mechanism.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* tried to not train embedding but use external word vectors&lt;br /&gt;
* most results of my attempts are bad, only 3-layer rnn + no dropout model got 25.54 bleu which about 2 points worse than original baseline&lt;br /&gt;
* trained original baseline on new data ( the data fixed the reverse sentence problem), got bleu=27.88; moses bleu=32.47&lt;br /&gt;
||&lt;br /&gt;
* try more models to get similar results as original baseline on new data&lt;br /&gt;
* m-nmt model on new data &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* learn the implement of seq2seq model&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
||&lt;br /&gt;
* understand the meaning of main code&lt;br /&gt;
* start writing documents&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22</id>
		<title>NLP Status Report 2017-5-22</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22"/>
				<updated>2017-05-24T06:08:20Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/22&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* bleu of baseline = 43.87&lt;br /&gt;
* 2nd translator uses as training data the concat(Chinese, machine translated English):&lt;br /&gt;
  hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)&lt;br /&gt;
* double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):&lt;br /&gt;
  bleu = 40.11 (best)&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output.&lt;br /&gt;
* The training process of double-decoder model '''without''' joint loss is problematic.&lt;br /&gt;
||&lt;br /&gt;
* Overfitting? Train large data on 2nd translator&lt;br /&gt;
* Replace the force teaching mechanism in training process with beam search mechanism.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* tried to not train embedding but use external word vectors&lt;br /&gt;
* most results of my attempts are bad, only 3-layer rnn + no dropout model got 25.54 bleu which about 2 points worse than original baseline&lt;br /&gt;
* trained original baseline on new data ( the data fixed the reverse sentence problem), got bleu=27.88; moses bleu=32.47&lt;br /&gt;
||&lt;br /&gt;
* try more models to get similar results as original baseline on new data&lt;br /&gt;
* m-nmt model on new data &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* learn the implement of seq2seq model&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
||&lt;br /&gt;
* understand the meaning of main code&lt;br /&gt;
* start writing documents&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22</id>
		<title>NLP Status Report 2017-5-22</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22"/>
				<updated>2017-05-24T04:31:42Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/22&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* bleu of baseline = 43.87&lt;br /&gt;
* 2nd translator uses as training data the concat(Chinese, machine translated English):&lt;br /&gt;
  hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)&lt;br /&gt;
* double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):&lt;br /&gt;
  bleu = 40.11 (best)&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output.&lt;br /&gt;
* The training process of double-decoder model '''without''' joint loss is problematic.&lt;br /&gt;
||&lt;br /&gt;
* Overfitting? Training large data on 2nd translator&lt;br /&gt;
* Replace the force teaching mechanism in training process with beam search mechanism.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* tried to not train embedding but use external word vectors&lt;br /&gt;
* most results of my attempts are bad, only 3-layer rnn + no dropout model got 25.54 bleu which about 2 points worse than original baseline&lt;br /&gt;
* trained original baseline on new data ( the data fixed the reverse sentence problem), got bleu=27.88; moses bleu=32.47&lt;br /&gt;
||&lt;br /&gt;
* try more models to get similar results as original baseline on new data&lt;br /&gt;
* m-nmt model on new data &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* learn the implement of seq2seq model&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
||&lt;br /&gt;
* understand the meaning of main code&lt;br /&gt;
* start writing documents&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22</id>
		<title>NLP Status Report 2017-5-22</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/NLP_Status_Report_2017-5-22"/>
				<updated>2017-05-24T04:27:29Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Date !! People !! Last Week !! This Week&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;6&amp;quot;|2017/5/22&lt;br /&gt;
|Jiyuan Zhang ||&lt;br /&gt;
|| &lt;br /&gt;
|-&lt;br /&gt;
|Aodong LI ||&lt;br /&gt;
* bleu of baseline = 43.87&lt;br /&gt;
* 2nd translator uses as training data the concat(Chinese, machine translated English):&lt;br /&gt;
  hidden_size, emb_size, lr = 500, 310, 0.001 bleu = 43.53 (best)&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.001 bleu = 45.21 (best) but most results are under 43.1&lt;br /&gt;
  hidden_size, emb_size, lr = 700, 510, 0.0005 bleu = 42.19 (best)&lt;br /&gt;
* double-decoder model with joint loss (final loss = 1st decoder's loss + 2nd decoder's loss):&lt;br /&gt;
  bleu = 40.11 (best)&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output.&lt;br /&gt;
* The training process of double-decoder model '''without''' joint loss is problematic.&lt;br /&gt;
||&lt;br /&gt;
* Replace the force teaching mechanism in training process with beam search mechanism.&lt;br /&gt;
|-&lt;br /&gt;
|Shiyue Zhang || &lt;br /&gt;
* tried to not train embedding but use external word vectors&lt;br /&gt;
* most results of my attempts are bad, only 3-layer rnn + no dropout model got 25.54 bleu which about 2 points worse than original baseline&lt;br /&gt;
* trained original baseline on new data ( the data fixed the reverse sentence problem), got bleu=27.88; moses bleu=32.47&lt;br /&gt;
||&lt;br /&gt;
* try more models to get similar results as original baseline on new data&lt;br /&gt;
* m-nmt model on new data &lt;br /&gt;
|-&lt;br /&gt;
|Shipan Ren ||&lt;br /&gt;
* learn the implement of seq2seq model&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
||&lt;br /&gt;
* understand the meaning of main code&lt;br /&gt;
* start writing documents&lt;br /&gt;
|-&lt;br /&gt;
    &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-23T12:34:00Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 21:30 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-23T09:12:08Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
*next:&lt;br /&gt;
  Try to train double-decoder model without joint loss but with beam search on 1st decoder.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-23T09:07:08Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 8 || &lt;br /&gt;
*details about experiment 1: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*details about experiment 2: &lt;br /&gt;
  hidden_size = 500&lt;br /&gt;
  emb_size = 310&lt;br /&gt;
  learning_rate = 0.001&lt;br /&gt;
  small data, &lt;br /&gt;
  double-decoder model with joint loss which means the final loss  = 1st decoder's loss + 2nd &lt;br /&gt;
  decoder's loss&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''39.04'''&lt;br /&gt;
  The 1st decoder's output is generally better than 2nd decoder's output. The reason may be that &lt;br /&gt;
  the second decoder only learns from the first decoder's hidden states because their states are &lt;br /&gt;
  almost the same.&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-23T08:58:27Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  But only one checkpoint outperforms the baseline, the other results are commonly under 43.1&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/23&lt;br /&gt;
|Aodong Li || 13:00 || 22:00 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700&lt;br /&gt;
  emb_size = 510&lt;br /&gt;
  learning_rate = 0.0005 (0.001 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''42.19'''&lt;br /&gt;
  Overfitting? In overall, the 2nd translator performs worse than baseline&lt;br /&gt;
*DISCOVERY: &lt;br /&gt;
  The reason why double-decoder without joint loss generalizes very bad is that the gap between&lt;br /&gt;
  force teaching mechanism (training process) and beam search mechanism (decoding process)&lt;br /&gt;
  propagates and expands the error to the output end, which destroys the model when decoding.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-22T12:58:49Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  Our model first outperforms the baseline&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/22&lt;br /&gt;
|Aodong Li || 14:00 || 22:00 || 8 || &lt;br /&gt;
*double-decoder without joint loss generalizes very bad&lt;br /&gt;
*i'm trying double-decoder model with joint loss&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-22T04:56:19Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 || 18:30 || 8 || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  Our model first outperforms the baseline&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-21T02:36:55Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code double decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/21&lt;br /&gt;
|Aodong Li || 10:30 ||  ||  || &lt;br /&gt;
*details about experiment: &lt;br /&gt;
  hidden_size = 700 (500 in prior)&lt;br /&gt;
  emb_size = 510 (310 in prior)&lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: '''45.21'''&lt;br /&gt;
  Our model first outperforms the baseline&lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-19T11:28:34Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code doubel decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 12:30 || 20:30 || 8 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-19T11:24:57Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code doubel decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 13:00 || 20:00 || 7 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
* the model performs well on develop set, but performs badly on test data. I want to figure out the reason.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-19T11:08:35Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code doubel decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/19&lt;br /&gt;
|Aodong Li || 13:00 || 20:00 || 7 || &lt;br /&gt;
* debug double-decoder model&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-18T12:33:06Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code doubel decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/18&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 12:30 || 21:00 || 8 || &lt;br /&gt;
* train double-decoder model on small data set but encounter decode bugs&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-18T07:32:48Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/10&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/11&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code doubel decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/15&lt;br /&gt;
|Shipan Ren || 9:30 || 19:00 || 9.5 || &lt;br /&gt;
* understand the difference between lstm model and gru model&lt;br /&gt;
* read the implement code of seq2seq model&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/17&lt;br /&gt;
|Shipan Ren || 9:30 || 19:30 || 10 || &lt;br /&gt;
* read neural machine translation paper&lt;br /&gt;
* read tf_translate code&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 24:00 || 9|| &lt;br /&gt;
* code and debug double-decoder model&lt;br /&gt;
* alter 2017/05/14 model's size and will try after nips&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Schedule</id>
		<title>Schedule</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Schedule"/>
				<updated>2017-05-15T05:48:01Z</updated>
		
		<summary type="html">&lt;p&gt;Intern：/* Daily Report */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=NLP Schedule=&lt;br /&gt;
&lt;br /&gt;
==Members==&lt;br /&gt;
&lt;br /&gt;
===Current Members===&lt;br /&gt;
&lt;br /&gt;
* Yang Feng (冯洋)&lt;br /&gt;
* Jiyuan Zhang （张记袁）&lt;br /&gt;
* Aodong Li (李傲冬)&lt;br /&gt;
* Andi Zhang (张安迪)&lt;br /&gt;
* Shiyue Zhang (张诗悦)&lt;br /&gt;
* Li Gu (古丽)&lt;br /&gt;
* Peilun Xiao (肖培伦)&lt;br /&gt;
* Shipan Ren (任师攀)&lt;br /&gt;
&lt;br /&gt;
===Former Members===&lt;br /&gt;
* '''Chao Xing (邢超)'''     :  FreeNeb&lt;br /&gt;
* '''Rong Liu (刘荣)'''      :  优酷&lt;br /&gt;
* '''Xiaoxi Wang (王晓曦)''' :  图灵机器人&lt;br /&gt;
* '''Xi Ma (马习)'''         :  清华大学研究生&lt;br /&gt;
* '''Tianyi Luo (骆天一)'''  ： phd candidate in University of California Santa Cruz&lt;br /&gt;
* '''Qixin Wang (王琪鑫)'''  :  MA candidate in University of California&lt;br /&gt;
* '''DongXu Zhang (张东旭)''': --&lt;br /&gt;
* '''Yiqiao Pan (潘一桥)'''  ： MA candidate in University of Sydney &lt;br /&gt;
* '''Shiyao Li （李诗瑶）''' :  BUPT&lt;br /&gt;
* '''Aiting Liu (刘艾婷)'''  :  BUPT&lt;br /&gt;
&lt;br /&gt;
==Work Progress==&lt;br /&gt;
===Daily Report===&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Person  !! start!! leave !! hours ||status&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/02&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/03&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/04&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/05&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/06&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/07&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/08&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/09&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/10&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/11&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/12&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/13&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/14&lt;br /&gt;
|Andy Zhang||9:30 ||18:30 ||8 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/04/15&lt;br /&gt;
|Andy Zhang||9:00 ||15:00 ||6 || &lt;br /&gt;
*preparing EMNLP&lt;br /&gt;
|-&lt;br /&gt;
|Peilun Xiao || || || ||&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/18&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Pick up new task in news generation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/19&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/20&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/21&lt;br /&gt;
|Aodong Li||12:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/24&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Adjust literature review focus&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/25&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/26&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/27&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Try to reproduce sc-lstm work&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/28&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Transfer to new task in machine translation and do literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/04/30&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/01&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/02&lt;br /&gt;
|Aodong Li||11:00 ||20:00 ||8 || &lt;br /&gt;
*Literature review and code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/06&lt;br /&gt;
|Aodong Li||14:20 ||17:20||3 || &lt;br /&gt;
*Code review&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/07&lt;br /&gt;
|Aodong Li||13:30 ||22:00||8 || &lt;br /&gt;
*Code review and experiment started, but version discrepancy encountered&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/08&lt;br /&gt;
|Aodong Li||11:30 ||21:00 ||8 || &lt;br /&gt;
*Code review and version discrepancy solved&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/09&lt;br /&gt;
|Aodong Li||13:00 ||22:00 ||9 || &lt;br /&gt;
*Code review and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 42.56&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/10&lt;br /&gt;
|Shipan Ren || 9:00 || 20:00 || 11 || &lt;br /&gt;
*Entry procedures&lt;br /&gt;
*Machine Translation paper reading&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:30 || 22:00 || 8 || &lt;br /&gt;
*experiment setting: &lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the different training data, counting 22000 and 22017 seperately&lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 36.67 (36.67 is the model at 4750 updates, but we use model at 3000 updates to&lt;br /&gt;
                     prevent the case of overfitting, to generate the 2nd translator's training data, for &lt;br /&gt;
                     which the BLEU is 34.96)&lt;br /&gt;
  best result of our model: 29.81&lt;br /&gt;
  This may suggest that that using either the same training data with 1st translator or different&lt;br /&gt;
                    one won't influence 2nd translator's performance, instead, using the same one may&lt;br /&gt;
                     be better, at least from results. But I have to give a consideration of a smaller size &lt;br /&gt;
                     of training data compared to yesterday's model.&lt;br /&gt;
*code 2nd translator with constant embedding&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot;|2017/05/11&lt;br /&gt;
|Shipan Ren || 10:00 || 19:30 || 9.5 || &lt;br /&gt;
*Configure environment &lt;br /&gt;
*Run tf_translate code&lt;br /&gt;
*Read Machine Translation paper&lt;br /&gt;
|-&lt;br /&gt;
|Aodong Li || 13:00 ||  21:00|| 8 || &lt;br /&gt;
*experiment setting:&lt;br /&gt;
  small data, &lt;br /&gt;
  1st and 2nd translator uses the same training data, &lt;br /&gt;
  2nd translator uses '''constant untrainable embedding''' imported from 1st translator's decoder&lt;br /&gt;
*results (BLEU):&lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.48&lt;br /&gt;
  Experiments show that this kind of series or cascade model will definitely impair the final perfor-&lt;br /&gt;
                      mance due to information loss as the information flows through the network from &lt;br /&gt;
                      end to end. Decoder's smaller vocabulary size compared to encoder's demonstrate&lt;br /&gt;
                      this (9000+ -&amp;gt; 6000+).&lt;br /&gt;
  The intention of this experiment is looking for a map to solve meaning shift using 2nd translator,&lt;br /&gt;
                      but result of whether the map is learned or not is obscured by the smaller vocab size &lt;br /&gt;
                      phenomenon.&lt;br /&gt;
*literature review on hierarchical machine translation&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/12&lt;br /&gt;
|Aodong Li||13:00 ||21:00 ||8 || &lt;br /&gt;
*Code double decoding model and read multilingual MT paper&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/13&lt;br /&gt;
|Shipan Ren || 10:00 || 19:00 || 9 || &lt;br /&gt;
*read machine translation paper &lt;br /&gt;
*learne lstm model and seq2seq model &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;1&amp;quot;|2017/05/14&lt;br /&gt;
|Aodong Li || 10:00 || 20:00 || 9 || &lt;br /&gt;
*Code doubel decoding model and experiment&lt;br /&gt;
*details about experiment: &lt;br /&gt;
  small data, &lt;br /&gt;
  2nd translator uses as training data the concat(Chinese, machine translated English), &lt;br /&gt;
  2nd translator uses '''random initialized embedding'''&lt;br /&gt;
*results (BLEU): &lt;br /&gt;
  BASELINE: 43.87&lt;br /&gt;
  best result of our model: 43.53&lt;br /&gt;
*NEXT: 2nd translator uses '''trained constant embedding'''&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Time Off Table===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Date !! Yang Feng !! Jiyuan Zhang &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Past progress==&lt;br /&gt;
[[nlp-progress 2017/03]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/02]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2017/01]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/12]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/11]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/10]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/09]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/08]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/05/01 -- 08/16 | nlp-progress 2016/05-07]]&lt;br /&gt;
&lt;br /&gt;
[[nlp-progress 2016/04]]&lt;/div&gt;</summary>
		<author><name>Intern</name></author>	</entry>

	</feed>