Schedule
Text Processing Team Schedule
Members
Former Members
- Rong Liu (刘荣) : 优酷
- Xiaoxi Wang (王晓曦) : 图灵机器人
- Xi Ma (马习) : 清华大学研究生
- DongXu Zhang (张东旭) : --
Current Members
- Tianyi Luo (骆天一)
- Chao Xing (邢超)
- Qixin Wang (王琪鑫)
- Yiqiao Pan (潘一桥)
- Aodong Li (李傲冬)
- Ziwei Bai (白子薇)
- Aiting Liu (刘艾婷)
Work Process
Research Task
Binary Word Embedding(Aiting)
Ordered Word Embedding(Aodong)
Matrix Factorization(Ziwei)
Question answering system
Chao Xing
2016-05-18 :
1. Modify model for crawler data.
2016-05-17 :
1. Code & Test HRNN model.
2016-05-16 :
1. Work done for CDSSM model.
2016-05-15 :
1. Test CDSSM model package version.
2016-05-13 :
1. Coding done CDSSM model package version. Wait to test.
2016-05-12 :
1. Begin to package CDSSM model for huilan.
2016-05-11 :
1. Prepare for paper sharing. 2. Finish CDSSM model in chatting process. 3. Start setup model & experiment in dialogue system.
2016-05-10 :
1. Finish test CDSSM model in chatting, find original data has some problem. 2. Read paper: A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion A Neural Network Approach to Context-Sensitive Generation of Conversational Responses Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models Neural Responding Machine for Short-Text Conversation
2016-05-09 :
1. Test CDSSM model in chatting model. 2. Read paper : Learning from Real Users Rating Dialogue Success with Neural Networks for Reinforcement Learning in Spoken Dialogue Systems SimpleDS A Simple Deep Reinforcement Learning Dialogue System 3. Code RNN by myself in tensorflow.
2016-05-08 :
Fix some problem in dialogue system team, and continue read some papers in dialogue system.
2016-05-07 :
Read some papers in dialogue system.
2016-05-06 :
Try to fix RNN-DSSM model in tensorflow. Failure..
2016-05-05 :
Coding for RNN-DSSM in tensorflow. Face an error when running rnn-dssm model in cpu : memory keep increasing. Tensorflow's version in huilan is 0.7.0 and install by pip, this cause using error in creating gpu graph, one possible solution is build tensorflow from source code.
Aiting Liu
2016-05-18:
Fetch American TV subtitles (1.Sex and the City 2.Gossip Girl 3.Desperate Housewives 4.The IT Crowd 5.Empire 6.Silicon Valley)
2016-05-16:Process the data collected from the interview site,interview books and American TV subtitles(38.2M+23.2M)
2016-05-11:
Fetch American TV subtitles (1.Friends 2.Big Bang Theory 3.The descendant of the Sun 4.Modern Family 5.House M.D. 6.Grey's Anatomy)
2016-05-08:Fetch data from 'http://news.ifeng.com/' and 'http://www.xinhuanet.com/'(13.4M)
2016-05-07:Fetch data from 'http://fangtan.china.com.cn/' and interview books (10M)
2016-05-04:Establish the overall framework of our chat robot,and continue to build database
Ziwei Bai
2016-05-21:
1、learn the second half of paper 'A Neural Conversational Model'
2016-05-18:
1、crawl QA pairs from http://www.chinalife.com.cn/publish/zhuzhan/index.html and http://www.pingan.com/ 2、find paper 'A Neural Conversational Model' from google scholar and learn the first half of it.
2016-05-16:
1、find datasets in paper 'Neural Responding Machine for Short-Text Conversation' 2、reconstruct 15 scripts into our expected formula
2016-05-15:
1、find 130 scripts 2、 reconstruct 11 scripts into our expected formula problem:many files cann't distinguish between dialogue and scenario describes by program.
2016-05-11:
1、read paper“Movie-DiC: a Movie Dialogue Corpus for Research and Development” 2、reconstruct a new film scripts into our expected formula
2016-05-08: convert the pdf we found yesterday into txt,and reconstruct the data into our expected formula
2016-05-07: Finding 9 Drama scripts and 20 film scripts
2016-05-04:Finding and dealing with the data for QA system
Generation Model (Aodong li)
- 2016-05-20 :
Optimize my code to speed up Train the models with GPU However, it does not converge :(
- 2016-05-19 : Code a simple version of keywords-to-sequence model and train the model
- 2016-05-18 : Debug keywords-to-sequence model and train the model
- 2016-05-17 : make technical details clear and code keywords-to-sequence model
- 2016-05-16 : Denoise and segment more lyrics and prepare for keywords to sequence model
- 2016-05-15 : Train some different models and analyze performance: song to song, paragraph to paragraph, etc.
- 2016-05-12 : complete sequence to sequence model's prediction process and the whole standard sequence to sequence lstm-based model v0.0
- 2016-05-11 : complete sequence to sequence model's training process in Theano
- 2016-05-10 : complete sequence to sequence lstm-based model in Theano
- 2016-05-09 : try to code sequence to sequence model
- 2016-05-08 :
denoise and train word vectors of Lijun Deng's lyrics (110+ pieces) decide on using raw sequence to sequence model
- 2016-05-07 :
study attention-based model learn some details about the poem generation model change my focus onto lyrics generation model
- 2016-05-06 : read the paper about poem generation and learn about LSTM
- 2016-05-05 : check in and have an overview of generation model
jiyuan zhang
- 2016-05-01~06 :modify input format and run lstmrbm model (16-beat,32-beat,bar)
- 2016-05-09~13:
Modify model parameters and run model ,the result is not ideal yet According to teacher Wang's opinion, in the generation stage,replace random generation with the maximum probability generation