“NLP Status Report 2017-8-21”版本间的差异

2017年8月28日 (一) 07:44的最后版本

Date	People	Last Week	This Week
2017/8/14	Jiyuan Zhang	done some work about code refactoring for poem system	plan to complete code refactoring for poem system
	Aodong LI
	Shiyue Zhang
	Shipan Ren	organized all the experimental results(our baseline system,Moses,THUMT) [1] trained and tested translation models（Toolkit:THUMT ） compared with our system	prepare to release the baseline system（tensorflow1.0 version）
	Jiayu Guo	process data and run model; test results. checkpoint-100000 translation model BLEU： 11.11 source:在秦者名错，与张仪争论,於是惠王使错将伐蜀，遂拔，因而守之。 target:在秦国的名叫司马错，曾与张仪发生争论，秦惠王采纳了他的意见，于是司马错率军攻蜀国，攻取后，又让他做了蜀地郡守。 trans：当时秦国的人都很欣赏他的建议，与张仪一起商议，所以吴王派使者率军攻打蜀地，一举攻，接着又下令守城。 source:神大用则竭，形大劳则敝，形神离则死。 target:精神过度使用就会衰竭，形体过度劳累就会疲惫，神形分离就会死亡。 trans: 精神过度就可衰竭,身体过度劳累就会疲惫，地形也就会死。 source:今天子接千岁之统，封泰山，而余不得从行，是命也夫，命也夫！ target:现天子继承汉朝千年一统的大业，在泰山举行封禅典礼而我不能随行，这是命啊，是命啊！ trans: 现在天子可以继承帝位的成就爵位，爵位至泰山，而我却未能执行先帝的命运。 1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most. 2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most. 3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9. 4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most. The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information).	plan to read source code of seq2seq model and learn tensorflow; plan to read a paper named Automatic Long Sentence Segmentation for NMT

@@ 第19行： / 第19行： @@
 |-
 |Shipan Ren ||
-* organized all the experimental results(our baseline system,Moses,THUMT)
+* organized all the experimental results(our baseline system,Moses,THUMT) [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/89/Nmt_baseline.xlsx]
 * trained and tested translation models（Toolkit:THUMT ）
 * compared with our system
@@ 第45行： / 第45行： @@
 *3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9.
 *4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most.
-*The main factors now is the data(including pairs of sentence、the quality——cause the modern language text include context information.
+*The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information).
 ||
-*plan to read source code of seq2seq model;
+*plan to read source code of seq2seq model and learn tensorflow;
 *plan to read a paper named Automatic Long Sentence Segmentation for NMT
 |-
 |}

“NLP Status Report 2017-8-21”版本间的差异

2017年8月28日 (一) 07:44的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具