“NLP Status Report 2017-7-10”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(相同用户的2个中间修订版本未显示)
第2行: 第2行:
 
!Date !! People !! Last Week !! This Week
 
!Date !! People !! Last Week !! This Week
 
|-
 
|-
| rowspan="6"|2017/7/3
+
| rowspan="6"|2017/7/10
 
|Jiyuan Zhang ||
 
|Jiyuan Zhang ||
 
*reproduced the couplet model using moses
 
*reproduced the couplet model using moses
第22行: 第22行:
 
|Shipan Ren ||
 
|Shipan Ren ||
 
* run two versions of the code on small data sets (Chinese-English)  and tested these checkpoint
 
* run two versions of the code on small data sets (Chinese-English)  and tested these checkpoint
     found version 1.0 save time about 0.03s  per step, and these two version  has  similar complexity and bleu values  
+
     found version 1.0 save time about 0.03s  per step,  
     found that the bleu is still good when the model is over fitting . (reason: the test set and the train set of small data set are similar in content and style)  
+
          and these two version  has  similar complexity and bleu values  
* run two versions of the code on big data sets (Chinese-English) . OOM(Out Of Memory) error occurred when version 0.1 was trained using large data set,but version 1.0 worked  
+
     found that the bleu is still good when the model is over fitting .
    reason: improper distribution of resources by the tensorflow0.1 frame leads to exhaustion of memory resources  
+
          (reason: the test set and the train set of small data set are similar in content and style)  
 +
* run two versions of the code on big data sets (Chinese-English) .  
 +
    OOM(Out Of Memory) error occurred when version 0.1 was trained using large data set,but version 1.0 worked  
 +
          reason: improper distribution of resources by the tensorflow0.1 frame leads to exhaustion of memory resources  
 
     I had tried 4 times (just enter the same command), and version 0.1 worked  
 
     I had tried 4 times (just enter the same command), and version 0.1 worked  
    found version 1.0 save time about 0.06s  per step, and these two version  has  similar complexity and bleu values  
+
          found version 1.0 save time about 0.06s  per step, and these two version  has  similar complexity and bleu values  
* downloaded the wmt2014 data set ,used the English-French data set to run the code and found the translation is not good (reason:improper word segmentation)
+
* downloaded the wmt2014 data set ,used the English-French data set to run the code and  
 +
    found the translation is not good (reason:improper word segmentation)
 
||
 
||
 
* do word segmentation on wmt2014  data set  
 
* do word segmentation on wmt2014  data set  

2017年8月21日 (一) 00:31的最后版本

Date People Last Week This Week
2017/7/10 Jiyuan Zhang
  • reproduced the couplet model using moses
  • continue to modify the couplet
Aodong LI
  • Tried a seq2seq with style code model but it didn't work.
  • Coded attention-based seq2seq NMT in shallow fusion with a language model.
  • Complete coding and have a try.
  • Find more monolingual corpus and upgrade the model.
Shiyue Zhang
Shipan Ren
  • run two versions of the code on small data sets (Chinese-English) and tested these checkpoint
    found version 1.0 save time about 0.03s  per step, 
          and these two version  has  similar complexity and bleu values 
    found that the bleu is still good when the model is over fitting .
          (reason: the test set and the train set of small data set are similar in content and style) 
  • run two versions of the code on big data sets (Chinese-English) .
    OOM(Out Of Memory) error occurred when version 0.1 was trained using large data set,but version 1.0 worked 
         reason: improper distribution of resources by the tensorflow0.1 frame leads to exhaustion of memory resources 
    I had tried 4 times (just enter the same command), and version 0.1 worked 
         found version 1.0 save time about 0.06s  per step, and these two version  has  similar complexity and bleu values 
  • downloaded the wmt2014 data set ,used the English-French data set to run the code and
   found the translation is not good (reason:improper word segmentation)
  • do word segmentation on wmt2014 data set
  • run two versions of the code on wmt2014 data set
  • record the result and do analysis
  • learn and train moses(use big data sets (Chinese-English))