“NLP Status Report 2017-7-17”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“{| class="wikitable" !Date !! People !! Last Week !! This Week |- | rowspan="6"|2017/7/3 |Jiyuan Zhang || *reproduced the couplet model using moses || *continue to...”为内容创建页面)
 
第4行: 第4行:
 
| rowspan="6"|2017/7/3
 
| rowspan="6"|2017/7/3
 
|Jiyuan Zhang ||
 
|Jiyuan Zhang ||
*reproduced the couplet model using moses
+
*
 
||  
 
||  
*continue to modify the couplet
+
*
 
|-
 
|-
 
|Aodong LI ||
 
|Aodong LI ||
* Tried a seq2seq with style code model but it didn't work.
+
*
* Coded attention-based seq2seq NMT in shallow fusion with a language model.
+
 
 
||
 
||
* Complete coding and have a try.
+
*
* Find more monolingual corpus and upgrade the model.
+
 
|-
 
|-
 
|Shiyue Zhang ||  
 
|Shiyue Zhang ||  
第21行: 第20行:
 
|-
 
|-
 
|Shipan Ren ||
 
|Shipan Ren ||
* run two versions of the code on small data sets (Chinese-English)  and tested these checkpoint
+
* found ways to tokenize the WMT2014 data  
    found version 1.0 save time about 0.03s  per step,
+
  rewrote prepare_data.py form moses-smt
          and these two version  has  similar complexity and bleu values
+
  used the tokenizer of moses-smt
* run two versions of the code on big data sets (Chinese-English) .
+
 
    
+
*train two versions of the code on WMT2014 en-de and en-fr datasets
* downloaded the wmt2014 data set ,used the English-French data set to run the code and
+
   tested these checkpoints of en-de dataset
    found the translation is not good (reason:improper word segmentation)
+
 
 
||
 
||
* do word segmentation on wmt2014  data set
+
* tested these checkpoints of en-fr dataset
* run two versions of the code on wmt2014  data set
+
 
* record the result and do analysis  
 
* record the result and do analysis  
* learn and train moses(use big data sets (Chinese-English))
+
* read papers about memory augmented NMT
 
|-
 
|-
 
      
 
      
  
 
|}
 
|}

2017年7月17日 (一) 14:45的版本

Date People Last Week This Week
2017/7/3 Jiyuan Zhang
Aodong LI
Shiyue Zhang
Shipan Ren
  • found ways to tokenize the WMT2014 data
  rewrote prepare_data.py form moses-smt
  used the tokenizer of moses-smt
  • train two versions of the code on WMT2014 en-de and en-fr datasets
  tested these checkpoints of en-de dataset
  • tested these checkpoints of en-fr dataset
  • record the result and do analysis
  • read papers about memory augmented NMT