“Xingchao work”版本间的差异
来自cslt Wiki
(→Work Schedule) |
(→SSA Model) |
||
| 第32行: | 第32行: | ||
Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is : | Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is : | ||
63.98% 77.57% Simple 2 classify | 63.98% 77.57% Simple 2 classify | ||
| + | 58.81% 73.91% Total 3 classify | ||
Test All-Belong SSA model for transform | Test All-Belong SSA model for transform | ||
2014年10月9日 (四) 01:53的版本
目录
Paper Recommendation
Pre-Trained Multi-View Word Embedding.[1]
Learning Word Representation Considering Proximity and Ambiguity.[2]
Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[3]
WikiRelate! Computing Semantic Relatedness Using Wikipedia.[4]
Japanese-Spanish Thesaurus Construction Using English as a Pivot[5]
Chaos Work
SSA Model
Build 2-dimension SSA-Model.
Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is :
27.83% 46.53% 2 classify
Test 25,50-dimension SSA-Model for transform
Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is :
27.9% 46.6% 1 classify
27.83% 46.53% 2 classify
27.43% 46.53% 3 classify
25.52% 45.83% 4 classify
25.62% 45.83% 5 classify
22.81% 42.51% 6 classify
11.96% 27.43% 50 classify
Reason explain : There are some points doesn't belong to class which training data belongs to. So the transform doesn't share correct transform matrix. The method we want to update is just cluster the training data, and the test the performance.
Simple cluster by 2 class.
23.51% 43.21% 2 classify
Train set as test set
Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is :
63.98% 77.57% Simple 2 classify
58.81% 73.91% Total 3 classify
Test All-Belong SSA model for transform
Start at : 2014-10-02
SEMPRE Research
Work Schedule
Download SEMPRE toolkit. Start at : 2014-09-30
Semantic Parsing via Paraphrasing [6]
Knowledge Vector
Pre-process corpus.
Start at : 2014-09-30.
Use toolkit Wikipedia_Extractor [7] waiting
End at : 2014-10-03 Result :
Original corpus is about 47G and after preprocessing the corpus is almost 17.8G
Analysis corpus, and training word2vec by wikipedia.
Start at : 2014-10-03.
Moses translation model
Pre-process corpus, remove the sentence which contains rarely seen words.
Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result :
Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
Train Model.
Start at : 2014-10-02 <--> End at : 2014-10-05
Tuning Model.
Start at : 2014-10-05
Non Linear Transform Testing
Work Schedule
Re-train best mse for test data.
Start at : 2014-10-01 <--> End at : 2014-10-02 <--> Result :
Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.
Hidden Layer : 400 1 incorrect number: 840 5 incorrect number: 705 total number : 995
600 796 636 995
800 763 601 995
1200 804 646 995
1400 825 676 995
Result : According to the result, I will test 800, 1200, 1400, and 1600 hidden layer.