“Xingchao work”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第47行: 第47行:
 
       Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
 
       Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
 
   Train Model.
 
   Train Model.
       Start at : 2014-10-02
+
       Start at : 2014-10-02 <--> End at : 2014-10-05
 +
  Tuning Model.
 +
      Start at : 2014-10-05
 
    
 
    
  

2014年10月5日 (日) 07:19的版本

Paper Recommendation

Pre-Trained Multi-View Word Embedding.[1]

Learning Word Representation Considering Proximity and Ambiguity.[2]

Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[3]

WikiRelate! Computing Semantic Relatedness Using Wikipedia.[4]

Japanese-Spanish Thesaurus Construction Using English as a Pivot[5]

Chaos Work

SSA Model

  Build 2-dimension SSA-Model.
     Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is : 
        27.83%   46.53%     2  classify
  Test 25,50-dimension SSA-Model for transform
     Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is : 
        11.96%   27.43%     50 classify
  Test All-Belong SSA model for transform
     Start at : 2014-10-02

SEMPRE Research

Work Schedule

  Download SEMPRE toolkit.
  Start at : 2014-09-30

Paper related

  Semantic Parsing via Paraphrasing [6]

Knowledge Vector

  Pre-process corpus.
     Start at : 2014-09-30.
        Use toolkit Wikipedia_Extractor [7] waiting
     End at : 2014-10-03  Result : 
        Original corpus is about 47G and after preprocessing the corpus is almost 17.8G
  Analysis corpus, and training word2vec by wikipedia.
     Start at : 2014-10-03.

Moses translation model

  Pre-process corpus, remove the sentence which contains rarely seen words.
      Start at : 2014-09-30 <--> End at : 2014-10-02  <--> Result : 
      Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
  Train Model.
      Start at : 2014-10-02 <--> End at : 2014-10-05
  Tuning Model.
      Start at : 2014-10-05
  

Non Linear Transform Testing

Work Schedule

  Re-train best mse for test data.
      Start at : 2014-10-01 <-->  End at : 2014-10-02 <--> Result : 
      Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.