“Xingchao work”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
SSA Model
 
(相同用户的48个中间修订版本未显示)
第1行: 第1行:
==Paper Recommendation==
+
=Chaos Work=
Pre-Trained Multi-View Word Embedding.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/3c/Pre-Trained_Multi-View_Word_Embedding.pdf]
+
[[SLT]]
 
+
Learning Word Representation Considering Proximity and Ambiguity.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b0/Learning_Word_Representation_Considering_Proximity_and_Ambiguity.pdf]
+
 
+
Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/5/5a/Continuous_Distributed_Representations_of_Words.pdf]
+
 
+
WikiRelate! Computing Semantic Relatedness Using Wikipedia.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/cb/WikiRelate%21_Computing_Semantic_Relatedness_Using_Wikipedia.pdf]
+
 
+
Japanese-Spanish Thesaurus Construction Using English as a Pivot[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/e8/Japanese-Spanish_Thesaurus_Construction.pdf]
+
 
+
==Chaos Work==
+
 
+
===SSA Model===
+
 
+
  Build 2-dimension SSA-Model.
+
      Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is :
+
        27.83%  46.53%    2  classify
+
  Test 25,50-dimension SSA-Model for transform
+
      Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is :
+
        27.9%    46.6%      1  classify
+
        27.83%  46.53%    2  classify
+
        27.43%  46.53%    3  classify
+
        25.52%  45.83%    4  classify
+
        25.62%  45.83%    5  classify
+
        22.81%  42.51%    6  classify
+
        11.96%  27.43%    50 classify
+
      Reason explain : There are some points doesn't belong to class which training data belongs to. So the transform doesn't share correct transform matrix. The method we want to update is just cluster the training data, and the test the performance.
+
  Simple cluster by 2 class.
+
        23.51%  43.21%    2  classify
+
  Train set as test set     
+
      Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is :
+
        63.98%  77.57%    Simple 2 classify
+
        58.81%  73.91%    Total 3 classify
+
 
+
  Test All-Belong SSA model for transform
+
      Start at : 2014-10-02
+
 
+
===SEMPRE Research===
+
====Work Schedule ====
+
  Download SEMPRE toolkit.
+
  Start at : 2014-09-30
+
 
+
====Paper related====
+
  Semantic Parsing via Paraphrasing [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/85/Semantic_Parsing_via_Paraphrasing.pdf]
+
 
+
===Knowledge Vector===
+
 
+
  Pre-process corpus.
+
      Start at : 2014-09-30.
+
        Use toolkit Wikipedia_Extractor [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] waiting
+
      End at : 2014-10-03  Result :
+
        Original corpus is about 47G and after preprocessing the corpus is almost 17.8G
+
  Analysis corpus, and training word2vec by wikipedia.
+
      Start at : 2014-10-03.
+
 
+
===Moses translation model===
+
 
+
  Pre-process corpus, remove the sentence which contains rarely seen words.
+
      Start at : 2014-09-30 <--> End at : 2014-10-02  <--> Result :
+
      Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
+
  Train Model.
+
      Start at : 2014-10-02 <--> End at : 2014-10-05
+
  Tuning Model.
+
      Start at : 2014-10-05
+
 
+
 
+
===Non Linear Transform Testing===
+
====Work Schedule====
+
  Re-train best mse for test data.
+
      Start at : 2014-10-01 <-->  End at : 2014-10-02 <--> Result :
+
      Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.
+
  Hidden Layer : 400  1 incorrect number: 840  5 incorrect number: 705  total number : 995
+
                  600                      796                      636                995
+
                  800                      763                      601                995
+
                1200                      804                      646                995
+
                1400                      825                      676                995
+
      Result : According to the result, I will test 800, 1200, 1400, and 1600 hidden layer.
+

2016年4月8日 (五) 04:44的最后版本

Chaos Work

SLT