“NLP Status Report 2017-8-21”版本间的差异
来自cslt Wiki
(3位用户的11个中间修订版本未显示) | |||
第1行: | 第1行: | ||
− | | class="wikitable" | + | {| class="wikitable" |
− | !Date!!People !! Last Week !! This Week | + | !Date !! People !! Last Week !! This Week |
|- | |- | ||
− | | rowspan=" | + | | rowspan="6"|2017/8/14 |
+ | |Jiyuan Zhang || | ||
+ | *done some work about code refactoring for poem system | ||
+ | || | ||
+ | *plan to complete code refactoring for poem system | ||
+ | |- | ||
+ | |Aodong LI || | ||
+ | |||
+ | || | ||
|- | |- | ||
− | | | + | |Shiyue Zhang || |
− | || | + | |
− | * | + | || |
+ | |||
+ | |- | ||
+ | |Shipan Ren || | ||
+ | * organized all the experimental results(our baseline system,Moses,THUMT) [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/89/Nmt_baseline.xlsx] | ||
+ | * trained and tested translation models(Toolkit:THUMT ) | ||
+ | * compared with our system | ||
+ | || | ||
+ | * prepare to release the baseline system(tensorflow1.0 version) | ||
+ | |- | ||
+ | |||
+ | |Jiayu Guo|| | ||
+ | * process data and run model; | ||
+ | * test results. | ||
+ | checkpoint-100000 translation model | ||
+ | BLEU: 11.11 | ||
+ | |||
+ | *source:在秦者名错,与张仪争论,於是惠王使错将伐蜀,遂拔,因而守之。 | ||
+ | *target:在秦国的名叫司马错,曾与张仪发生争论,秦惠王采纳了他的意见,于是司马错率军攻蜀国,攻取后,又让他做了蜀地郡守。 | ||
+ | *trans:当时秦国的人都很欣赏他的建议,与张仪一起商议,所以吴王派使者率军攻打蜀地,一举攻,接着又下令守城 。 | ||
+ | *source:神大用则竭,形大劳则敝,形神离则死 。 | ||
+ | *target:精神过度使用就会衰竭,形体过度劳累就会疲惫,神形分离就会死亡。 | ||
+ | *trans: 精神过度就可衰竭,身体过度劳累就会疲惫,地形也就会死。 | ||
+ | *source:今天子接千岁之统,封泰山,而余不得从行,是命也夫,命也夫! | ||
+ | *target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊! | ||
+ | *trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。 | ||
+ | *1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most. | ||
+ | *2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most. | ||
+ | *3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9. | ||
+ | *4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most. | ||
+ | *The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information). | ||
|| | || | ||
− | * read | + | *plan to read source code of seq2seq model and learn tensorflow; |
+ | *plan to read a paper named Automatic Long Sentence Segmentation for NMT | ||
|- | |- | ||
+ | |} |
2017年8月28日 (一) 07:44的最后版本
Date | People | Last Week | This Week |
---|---|---|---|
2017/8/14 | Jiyuan Zhang |
|
|
Aodong LI | |||
Shiyue Zhang | |||
Shipan Ren |
|
| |
Jiayu Guo |
checkpoint-100000 translation model BLEU: 11.11
|
|