“NLP Status Report 2017-8-21”版本间的差异
来自cslt Wiki
(3位用户的5个中间修订版本未显示) | |||
第4行: | 第4行: | ||
| rowspan="6"|2017/8/14 | | rowspan="6"|2017/8/14 | ||
|Jiyuan Zhang || | |Jiyuan Zhang || | ||
− | *done work about code refactoring for poem system | + | *done some work about code refactoring for poem system |
|| | || | ||
*plan to complete code refactoring for poem system | *plan to complete code refactoring for poem system | ||
第19行: | 第19行: | ||
|- | |- | ||
|Shipan Ren || | |Shipan Ren || | ||
− | * organized all the experimental results(our baseline system,Moses,THUMT) | + | * organized all the experimental results(our baseline system,Moses,THUMT) [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/89/Nmt_baseline.xlsx] |
− | * | + | * trained and tested translation models(Toolkit:THUMT ) |
− | * | + | * compared with our system |
− | + | ||
|| | || | ||
* prepare to release the baseline system(tensorflow1.0 version) | * prepare to release the baseline system(tensorflow1.0 version) | ||
第42行: | 第41行: | ||
*target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊! | *target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊! | ||
*trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。 | *trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。 | ||
− | |||
− | |||
− | |||
*1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most. | *1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most. | ||
*2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most. | *2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most. | ||
*3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9. | *3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9. | ||
*4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most. | *4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most. | ||
− | *The main factors now is the data( | + | *The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information). |
+ | || | ||
+ | *plan to read source code of seq2seq model and learn tensorflow; | ||
+ | *plan to read a paper named Automatic Long Sentence Segmentation for NMT | ||
|- | |- | ||
|} | |} |
2017年8月28日 (一) 07:44的最后版本
Date | People | Last Week | This Week |
---|---|---|---|
2017/8/14 | Jiyuan Zhang |
|
|
Aodong LI | |||
Shiyue Zhang | |||
Shipan Ren |
|
| |
Jiayu Guo |
checkpoint-100000 translation model BLEU: 11.11
|
|