“NLP Status Report 2017-8-21”版本间的差异
来自cslt Wiki
(3位用户的8个中间修订版本未显示) | |||
第1行: | 第1行: | ||
{| class="wikitable" | {| class="wikitable" | ||
− | !Date!!People !! Last Week !! This Week | + | !Date !! People !! Last Week !! This Week |
|- | |- | ||
− | | rowspan=" | + | | rowspan="6"|2017/8/14 |
+ | |Jiyuan Zhang || | ||
+ | *done some work about code refactoring for poem system | ||
+ | || | ||
+ | *plan to complete code refactoring for poem system | ||
+ | |- | ||
+ | |Aodong LI || | ||
+ | |||
+ | || | ||
|- | |- | ||
− | | | + | |Shiyue Zhang || |
− | || | + | |
+ | || | ||
+ | |||
+ | |- | ||
+ | |Shipan Ren || | ||
+ | * organized all the experimental results(our baseline system,Moses,THUMT) [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/89/Nmt_baseline.xlsx] | ||
+ | * trained and tested translation models(Toolkit:THUMT ) | ||
+ | * compared with our system | ||
+ | || | ||
+ | * prepare to release the baseline system(tensorflow1.0 version) | ||
+ | |- | ||
+ | |||
+ | |Jiayu Guo|| | ||
* process data and run model; | * process data and run model; | ||
* test results. | * test results. | ||
第21行: | 第41行: | ||
*target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊! | *target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊! | ||
*trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。 | *trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。 | ||
− | |||
− | |||
− | |||
*1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most. | *1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most. | ||
*2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most. | *2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most. | ||
*3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9. | *3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9. | ||
*4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most. | *4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most. | ||
− | *The main factors now is the data( | + | *The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information). |
+ | || | ||
+ | *plan to read source code of seq2seq model and learn tensorflow; | ||
+ | *plan to read a paper named Automatic Long Sentence Segmentation for NMT | ||
|- | |- | ||
− | |||
|} | |} |
2017年8月28日 (一) 07:44的最后版本
Date | People | Last Week | This Week |
---|---|---|---|
2017/8/14 | Jiyuan Zhang |
|
|
Aodong LI | |||
Shiyue Zhang | |||
Shipan Ren |
|
| |
Jiayu Guo |
checkpoint-100000 translation model BLEU: 11.11
|
|