“NLP Status Report 2017-8-21”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(撤销Guojiayu讨论)的版本28325)
 
(3位用户的12个中间修订版本未显示)
第1行: 第1行:
 +
{| class="wikitable"
 +
!Date !! People !! Last Week !! This Week
 +
|-
 +
| rowspan="6"|2017/8/14
 +
|Jiyuan Zhang ||
 +
*done some work about code refactoring for poem system
 +
||
 +
*plan to complete code refactoring for poem system
 +
|-
 +
|Aodong LI ||
  
 +
||
 +
 +
|-
 +
|Shiyue Zhang ||
 +
 +
||
 +
 +
|-
 +
|Shipan Ren ||
 +
* organized all the experimental results(our baseline system,Moses,THUMT) [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/89/Nmt_baseline.xlsx]
 +
* trained and tested translation models(Toolkit:THUMT )
 +
* compared with our system
 +
||
 +
* prepare to release the baseline system(tensorflow1.0 version)
 +
|-
 +
   
 +
|Jiayu Guo||
 +
* process data and run model;
 +
* test results.
 +
checkpoint-100000 translation model
 +
BLEU: 11.11
 +
 +
*source:在秦者名错,与张仪争论,於是惠王使错将伐蜀,遂拔,因而守之。
 +
*target:在秦国的名叫司马错,曾与张仪发生争论,秦惠王采纳了他的意见,于是司马错率军攻蜀国,攻取后,又让他做了蜀地郡守。
 +
*trans:当时秦国的人都很欣赏他的建议,与张仪一起商议,所以吴王派使者率军攻打蜀地,一举攻,接着又下令守城 。
 +
*source:神大用则竭,形大劳则敝,形神离则死 。
 +
*target:精神过度使用就会衰竭,形体过度劳累就会疲惫,神形分离就会死亡。
 +
*trans: 精神过度就可衰竭,身体过度劳累就会疲惫,地形也就会死。
 +
*source:今天子接千岁之统,封泰山,而余不得从行,是命也夫,命也夫!
 +
*target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊!
 +
*trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。
 +
*1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most.
 +
*2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most.
 +
*3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9.
 +
*4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most.
 +
*The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information).
 +
|| 
 +
*plan to read source code of seq2seq model and learn tensorflow;
 +
*plan to read a paper named Automatic Long Sentence Segmentation for NMT
 +
|-
 +
|}

2017年8月28日 (一) 07:44的最后版本

Date People Last Week This Week
2017/8/14 Jiyuan Zhang
  • done some work about code refactoring for poem system
  • plan to complete code refactoring for poem system
Aodong LI
Shiyue Zhang
Shipan Ren
  • organized all the experimental results(our baseline system,Moses,THUMT) [1]
  • trained and tested translation models(Toolkit:THUMT )
  • compared with our system
  • prepare to release the baseline system(tensorflow1.0 version)
Jiayu Guo
  • process data and run model;
  • test results.

checkpoint-100000 translation model BLEU: 11.11

  • source:在秦者名错,与张仪争论,於是惠王使错将伐蜀,遂拔,因而守之。
  • target:在秦国的名叫司马错,曾与张仪发生争论,秦惠王采纳了他的意见,于是司马错率军攻蜀国,攻取后,又让他做了蜀地郡守。
  • trans:当时秦国的人都很欣赏他的建议,与张仪一起商议,所以吴王派使者率军攻打蜀地,一举攻,接着又下令守城 。
  • source:神大用则竭,形大劳则敝,形神离则死 。
  • target:精神过度使用就会衰竭,形体过度劳累就会疲惫,神形分离就会死亡。
  • trans: 精神过度就可衰竭,身体过度劳累就会疲惫,地形也就会死。
  • source:今天子接千岁之统,封泰山,而余不得从行,是命也夫,命也夫!
  • target:现天子继承汉朝千年一统的大业,在泰山举行封禅典礼而我不能随行,这是命啊,是命啊!
  • trans: 现在天子可以继承帝位的成就爵位,爵位至泰山,而我却未能执行先帝的命运。
  • 1.data used Zizhitongjian only(6,000 pairs), we can get BLEU 6 at most.
  • 2.data used Zizhitongjian only(12,000 pairs), we can get BLEU 7 at most.
  • 3.data used Shiji and Zizhitongjian(43,0000 pairs), we can get BLEU about 9.
  • 4.data used Shiji and Zizhitongjian(43,0000 pairs), and split the ancient language text one character by one, we can get BLEU 11.11 at most.
  • The main factors now is the data(pairs of sentence/the quality——the modern language text includes context information).
  • plan to read source code of seq2seq model and learn tensorflow;
  • plan to read a paper named Automatic Long Sentence Segmentation for NMT