“NLP Status Report 2017-7-31”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(4位用户的11个中间修订版本未显示)
第3行: 第3行:
 
!Date !! People !! Last Week !! This Week
 
!Date !! People !! Last Week !! This Week
 
|-
 
|-
| rowspan="6"|2017/7/3
+
| rowspan="6"|2017/7/31
 
|Jiyuan Zhang ||
 
|Jiyuan Zhang ||
*made the poster for ACL
+
*made the poster for ACL [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/9/95/Acl2017-poster.pdf]
 
*attempted to fix repeated word, but failed
 
*attempted to fix repeated word, but failed
 
*done some work of n-gram model of the couplet
 
*done some work of n-gram model of the couplet
第14行: 第14行:
 
|-
 
|-
 
|Aodong LI ||
 
|Aodong LI ||
 
+
* Got 55,000+ Englsih poems and 260,000+ lines after preprocessing
 +
* Added phase separators as the style indicator, and every line has at least one separator
 +
* Training loss didn't decrease very much, only from 440 to 50
 +
* The translation quality deteriorated when added language model
 
||
 
||
 
+
* Try to use a larger language model to decrease the training loss
 +
* Try to use character-based MT in English-Chinese translation
 
|-
 
|-
 
|Shiyue Zhang ||  
 
|Shiyue Zhang ||  
第24行: 第28行:
 
|-
 
|-
 
|Shipan Ren ||
 
|Shipan Ren ||
* trained two models of the baseline using WMT2014 en-fr datasets
+
* looked for the performance(the bleu value) of other models
  under training
+
  on the WMT2014 dataset from the published papers,but not found.
 
+
* installed and built Moses on the server    
 
+
* read some papers(memory-augmented-nmt and Memory augmented Chinese-Uyghur Neural Machine Translation)    
+
 
||
 
||
* read memory-augmented-nmt code
+
* train statistical machine translation model and test it
* read papers about memory augmented NMT
+
  toolkit: Moses
 +
  data sets:WMT2014 en-de、en-fr data sets
 +
* collate experimental results.compare our baseline model with Moses
 
|-
 
|-
 
      
 
      
 
|Jiayu Guo||
 
|Jiayu Guo||
*Until now, Shiji has been split up to 2,5000 pairs of sentence, Zizhitongjian has been split up to 2,0000 pairs.
+
*process document.Until now, Shiji has been split up to 2,4000 pairs of sentence.
*
+
*Zizhitongjian has been split up to 1,6000 pairs.
 
||
 
||
*process document
+
*adjust jieba source code, in order to make jieba more accurate for ancient language wordpiece
 +
*read model source code
 
|-
 
|-
 
|}
 
|}

2017年8月21日 (一) 00:51的最后版本

Date People Last Week This Week
2017/7/31 Jiyuan Zhang
  • made the poster for ACL [1]
  • attempted to fix repeated word, but failed
  • done some work of n-gram model of the couplet
  • generate streame according to a couplet
  • complete the task of filling in the blanks of a couplet
Aodong LI
  • Got 55,000+ Englsih poems and 260,000+ lines after preprocessing
  • Added phase separators as the style indicator, and every line has at least one separator
  • Training loss didn't decrease very much, only from 440 to 50
  • The translation quality deteriorated when added language model
  • Try to use a larger language model to decrease the training loss
  • Try to use character-based MT in English-Chinese translation
Shiyue Zhang
Shipan Ren
  • looked for the performance(the bleu value) of other models
 on the WMT2014 dataset from the published papers,but not found.
  • installed and built Moses on the server
  • train statistical machine translation model and test it
 toolkit: Moses
 data sets:WMT2014 en-de、en-fr data sets
  • collate experimental results.compare our baseline model with Moses
Jiayu Guo
  • process document.Until now, Shiji has been split up to 2,4000 pairs of sentence.
  • Zizhitongjian has been split up to 1,6000 pairs.
  • adjust jieba source code, in order to make jieba more accurate for ancient language wordpiece
  • read model source code