2016年5月5日 (四) 02:40的版本

Text Processing Team Schedule

Members

Former Members

Rong Liu (刘荣) : 优酷
Xiaoxi Wang (王晓曦) : 图灵机器人
Xi Ma (马习) : 清华大学研究生
DongXu Zhang (张东旭) : --

Current Members

Tianyi Luo (骆天一)
Chao Xing (邢超)
Qixin Wang (王琪鑫)
Yiqiao Pan (潘一桥)
Aodong Li (李傲冬)
Ziwei Bai (白子薇)
Aiting Liu (刘艾婷)

Work Process

Question & Answering (Aiting Liu)

2016-04-24 : make my biweekly report

2016-04-23 : read Fader's paper (2011)

2016-04-20 : read Fader's paper (2013)

2016-04-15 : learn dssm and sent2vec

2016-04-16 : try to figure out how the PARALAX dataset is constructed

2016-04-17 : download the PARALAX dataset and try to turn it into what we want it to be

@@ 第18行： / 第18行： @@
 ==Work Process==
-===Similar questions senetence vector model training with RNN/LSTM and the attention RNN/LSTM chatting model training (Tianyi Luo)===
---------------------2016-04-22
-* Speed up process of the test performance about theano version of Generationg the similar questions' vectors based on RNN.
---------------------2016-04-21
-* Finish helping Teacher Wang to prepare for text group's presentation(Tang poetry and Songci generation and Intelligent QA system) for Tsinghua University's 105 anniversary.
-* Submit our IJCAI paper to arxiv. (Solve a big problem about submitting the paper including Chinese chacracters. [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/How_to_submit_the_latex_files_including_Chinese_characters_to_arxiv Solution])
-* Optimize theano version of Generationg the similar questions' vectors based on RNN.
---------------------2016-04-20
-* Finish submiting the camera version paper of IJCAI 2016.
-* Update the version of Technical Report about Chinese Song Iambics generation.
---------------------2016-04-19
-* Optimize theano version of Generationg the similar questions' vectors based on RNN.
---------------------2016-04-18
-* Optimize theano version of Generationg the similar questions' vectors based on RNN.
-* Finish implementing theano version of LSTM Max margin vector training.
-===Reproduce DSSM Baseline (Chao Xing)===
-: 2016-04-28 : Given a talk to text team for some recently paper.
-               Knowledge Base Completion via Search-Based Question Answering : [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b1/Knowledge_Base_Completion_via_Search-Based_Question_Answering_-_Report.pdf pdf]
-               Open Domain Question Answering via Semantic Enrichment  : [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/15/Open_Domain_Question_Answering_via_Semantic_Enrichment_-_Report.pdf pdf]
-               A Neural Conversational Model : [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/15/A_Neural_Conversational_Model_-_Report.pdf pdf]
-               And given a tiny results for CNN-DSSM in huilan's weekly report.
-: 2016-04-27 : Code Multi-layer CNN, suffered from memory error in GPU in tensorflow.
-               So I run such test on CPU, should slow.
-: 2016-04-26 : Code done tricky & analysis such tricky.
-: 2016-04-25 : Find a tricky to improve accuracy given by Tianyi.
-             : Code for this tricky.
-: 2016-04-23 : Set a series of experiment set.
-. Try deep CNN-DSSM, current model just follow proposed model contain one convolution layer, need to be a tuneable parameter.
-. Test whether mixture data effective to current model and deep CDSSM.
-. Code Recurrent CNN-DSSM (new approach.)
-: 2016-04-22 : Find a problem : Use labs' gpu machine 970 iteration per time is 1537 second but huilan's server is just 7 second.
-               Achieve reasonable results when apply max-margin method to CNN-DSSM model.
-: 2016-04-21 : True DSSM model doesn't work well, analysis as below:
-. Not exactly reproduce DSSM model, because the original one is English version, I just adapt it to Chinese but after word segmentation.
-                   So the input is tri-gram words not tri-gram letter.
-. Our dataset far from rich, because of we do not use pre-trained word vectors as initial vectors, we can hardly achieve good performance.
-             : Request
-. As we have rich pre-trained word vectors, maybe CDSSM or RDSSM corrected to our task.
-. Different length of sequences seek to be fixed dimension vectors, just CNN and RNN can do such things, DNN can not do it by using
-                  fix length of word vectors
-             : Coding done CDSSM. Test for it's performance.
-                One problem : When you install tensorflow by pip 0.8.0 and you want to use conv2d function by gpu, you need make sure you had already
-                             install your cudnn's version as 4.0 not lastest 5.0.
-: 2016-04-20 : Find reproduced DSSM model's bug, fix it.
-: 2016-04-19 : Code mixture data model by less memory dependency done. Test it's performance.
-: 2016-04-18 : Code mixture data model.
-: 2016-04-16 : Code mixture data model, but face to memory error. Dr. Wang help me fix it.
-: 2016-04-15 : Share Papers. Investigation a series of DSSM papers for future work. And show our intern students how to do research.
-             : Original DSSM model : Learning Deep Structured Semantic Models for Web Search using Clickthrough Data [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/45/2013_-_Learning_Deep_Structured_Semantic_Models_for_Web_Search_using_Clickthrough_Data_-_Report.pdf pdf]
-             : CNN based DSSM model : A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b7/2014_-_A_Latent_Semantic_Model_with_Convolutional-Pooling_Structure_for_Information_Retrieval_-_Report.pdf pdf]
-             : Use DSSM model for a new area : Modeling Interestingness with Deep Neural Networks [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/1/1f/2014_-_Modeling_Interestingness_with_Deep_Neural_Networks_-_Report.pdf pdf]
-             : Latest approach for LSTM + RNN DSSM model : SEMANTIC MODELLING WITH LONG-SHORT-TERM MEMORY FOR INFORMATION RETRIEVAL [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/2/24/2015_-_SEMANTIC_MODELLING_WITH_LONG-SHORT-TERM_MEMORY_FOR_INFORMATION_RETRIEVAL_-_Report.pdf pdf]
-: 2016-04-14 : Test dssm-dnn model, code dssm-cnn model.
-               Continue investigate deep neural question answering system.
-: 2016-04-13 : test dssm model, investigate deep neural question answering system.
-             : Share theano ppt [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:Theano-RBM.pptx theano]
-             : Share tensorflow ppt [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tensorflow.pptx tensorflow]
-: 2016-04-12 : Write done dssm tensor flow version.
-: 2016-04-11 : Write tensorflow toolkit ppt for intern student.
-: 2016-04-10 : Learn tensorflow toolkit.
-: 2016-04-09 : Learn tensorflow toolkit.
-: 2016-04-08 : Finish theano version.
-===Deep Poem Processing With Image (Ziwei Bai)===
-: 2016-04-20 :combine my program with Qixin Wang's
-: 2016-04-10 : web spider to catch a thousand pices of images.
-: 2016-04-13 :1、download theano for python2.7。  2.debug cnn.py
-: 2016-04-15 :web spider to catch 30 thousands pices of images and store them into a matrix
-: 2016-04-16 :modify the code of CNN and spider
-: 2016-04-17 :train convouloutional neural network
-===RNN Piano Processing (Jiyuan Zhang)===
-:2016-4-12：select appropriate  midis and run rnnrbm model
-:2016-4-13：view  rnnrbm model‘s  code
-:2016-4-14~15:coding to select 4/4 beat of midis
-:2016-4-17~22:run data, failed several times ，then modify code  and  view rnnrbm model's code
-:2016-4-25~29:replace rnnrbm  with lstmrbm, then run lstmrbm's model
 ===Question & Answering (Aiting Liu)===
 : 2016-04-24 : make my biweekly report
@@ 第105行： / 第25行： @@
 : 2016-04-16 : try to figure out how the PARALAX dataset is constructed
 : 2016-04-17 : download the PARALAX dataset and try to turn it into what we want it to be
-===Generation Model (Aodong li)===
-: 2016-05-05 : check in

“ASR work Schedule”版本间的差异