Zxw：以“==Speech Processing == === AM development === ==== Environment ==== ==== End-to-End ==== monophone ASR --Zhiyuan : MPE :* CTC/nnet3/Kaldi :* more refered ruselt...”为内容创建页面

2016-01-09T01:32:54Z

以“==Speech Processing == === AM development === ==== Environment ==== ==== End-to-End ==== *monophone ASR --Zhiyuan :* MPE :* CTC/nnet3/Kaldi :* more refered ruselt...”为内容创建页面

新页面

==Speech Processing ==

=== AM development ===

==== Environment ====

==== End-to-End ====
*monophone ASR --Zhiyuan
:* MPE
:* CTC/nnet3/Kaldi
:* more refered ruselts of Kaldi/CTC on 1400h-Chinese and plus 100h English
:* Further test nnet3-ctc training on 4000h 8k & 10000h 16k dataset. Tune decoding configures (phone-lm-weight, acwt, blank-scale), but ctc is still about 5 percent worse than standard nnet3
:* launched experiments of mpe after CTC on WSJ, code rivised
:* experiments of "chain" on WSJ

====Adapative learning rate method====
* sequence training -Xiangyu
:* write a technique report

==== Mic-Array ====
* hold
* compute EER with kaldi

====Data selection unsupervised learning====
* hold
* acoustic feature based submodular using Pingan dataset --zhiyong
* write code to speed up --zhiyong
* curriculum learning --zhiyong

====RNN-DAE(Deep based Auto-Encode-RNN)====
* hold
* RNN-DAE has worse performance than DNN-DAE because training dataset is small
* extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

===Speaker recognition===
* DNN-ivector framework
* SUSR
* AutoEncoder + metric learning
* binary ivector
* Deep speaker embedding tasks

===language vector===
* write a paper--zhiyuan
:*hold
* language vector is added to multi hidden layers--zhiyuan
:* write code done
:* check code
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=480
* RNN language vector
:*hold
* language vector into multiple layers --Zhiyuan
:* a Chinese paper
* speech rate into multiple layers --Zhiyuan
:*verify the code for extra input(s) into DNN

===multi-GPU===
* multi-stream training --Sheng Su
:* write a technique report
* kaldi-nnet3 --Xuewei
:* 7*2048 8k 1400h tdnn training Xent done
:* nnet3 mpe code is under investigation
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
:*Analysed the MPE divergence problem when the context is 10. The reason may be ascribed to the Over-fitting. Large data-set may weaken the over-fitting phenomenon.
*RNN AM training on big dataset --mengyuan
:* fix decode bug
:* nnet3 lstm & blstm training on sinovoice 120h dataset, using KALDI's default config. but result is worse than tdnn
:* Test nnet3-MPE code from Xuewei on sinovoice 120h 16k dataset. Didn't observe performance improvement. There are still some bugs in the code.
:* Run nnet3-ctc training on sinovoice 120h 16k dataset. Result looks ok, but worse than normal model.
:* Start nnet3-ctc training on sinovoice 4000h 8k dataset.
* train mpe --Zhiyong,Xuewei
:*train nnet3 mpe using data from Jietong--Xuewei
:* modify code to print stats --Xuewei
:* The MPE does not work when the context is 10, need to further investigated--zhiyong
:* The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong
:*modify code to reduce memory

===multi-task===
* test according to selt-information neural structure learning --mengyuan
:* hold
:* write code done
:* no significant performance improvement observed
* speech rate learning --xiangyu
:* hold
:* no significant performance improvement observed
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
: get results with extra input of speech rate info --Zhiyuan

===30 chinese dataset===
* revise syllable text
* add ome words in lexicon, which is applied to training and making graph.
* train and decode 30 chinese data again
* revise the techinique report
*prepare data
*kaldi recipe

==Text Processing==
===Work===
====RNN Poem Process====
* Combine addition rhyme.
* Investigate new method.
====Document Represent====
* Code done. Wait some experiments result.
====Seq to Seq====
* Work on some tasks.
====Order representation ====
* Code some idea.
====Balance Representation====
* Investigate some papers.
* Current solution : Use knowledge or large corpus's similar pair.

===Hold===
====Neural Based Document Classification====
====RNN Rank Task====
====Graph RNN====
:* Entity path embeded to entity.
*(hold)
====RNN Word Segment====
:* Set bound to word segment.
* (hold)
====Recommendation====
* Reproduce baseline.
:*LDA matrix dissovle.
:* LDA (Text classification & Recommendation System) --> AAAI
====RNN based QA====
*Read Source Code.
*Attention based QA.
*Coding.

===Text Group Intern Project===
====Buddhist Process====
:*(hold)

====RNN Poem Process====
*Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

====RNN Document Vector====
:*(hold)

====Image Baseline====
:*Demo Release.
:*Paper Report.
*Read CNN Paper.

===Text Intuitive Idea===
====Trace Learning====
* (Hold)
====Match RNN ====
* (Hold)

=financial group=
==model research==
* RNN
:* online model, update everyday
:* modify cost function and learning method
:* add more feature
==rule combination==
* GA method to optimize the model

==basic rule==
* classical tenth model
==multiple-factor==
* add more factor
* use sparse model
==display==
* bug fixed
:* buy rule fixed
==data==
* data api
:* download the future data and factor data

ASR:2016-01-04 - 版本历史

Zxw：以“==Speech Processing == === AM development === ==== Environment ==== ==== End-to-End ==== *monophone ASR --Zhiyuan :* MPE :* CTC/nnet3/Kaldi :* more refered ruselt...”为内容创建页面

Zxw：以“==Speech Processing == === AM development === ==== Environment ==== ==== End-to-End ==== monophone ASR --Zhiyuan : MPE :* CTC/nnet3/Kaldi :* more refered ruselt...”为内容创建页面