“ASR:2016-01-18”版本间的差异
来自cslt Wiki
(以“ == 大标题文字 == ==Speech Processing == === AM development === ==== Environment ==== ==== End-to-End ==== *monophone ASR --Zhiyuan :* MPE :* CTC/nnet3/Kaldi...”为内容创建页面) |
(→30 chinese dataset) |
||
(相同用户的5个中间修订版本未显示) | |||
第18行: | 第18行: | ||
:* Reproduce TZY's experiment on 1400h+100h dataset--mengyuan | :* Reproduce TZY's experiment on 1400h+100h dataset--mengyuan | ||
:* Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan | :* Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan | ||
+ | :* test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan | ||
+ | :* Finished nnet3-ctc alignment coding implementation--zhiyong | ||
+ | :* ctc-mpe-1: codes considering CctcTransition done, waiting for the results | ||
+ | :* ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement | ||
====Adapative learning rate method==== | ====Adapative learning rate method==== | ||
第45行: | 第49行: | ||
* Deep speaker embedding tasks | * Deep speaker embedding tasks | ||
* For the max-margin metric learning task, make some additional experiments | * For the max-margin metric learning task, make some additional experiments | ||
− | + | :*there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled | |
+ | :*the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set. | ||
+ | :*Two tricks: i-vector shuffle and speaker selection | ||
+ | *Pair-wised speaker vector training --> Data preparation | ||
+ | *Deep speaker embedding --> Data preparation | ||
+ | |||
===language vector=== | ===language vector=== | ||
* write a paper--zhiyuan | * write a paper--zhiyuan | ||
第80行: | 第89行: | ||
:* The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong | :* The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong | ||
:*modify code to reduce memory | :*modify code to reduce memory | ||
− | + | * Implement dark-knowledge on nnet3 and do some experiments | |
===multi-task=== | ===multi-task=== | ||
第100行: | 第109行: | ||
*prepare data | *prepare data | ||
*kaldi recipe | *kaldi recipe | ||
+ | :*revise the code with the help of Teacher Wang | ||
+ | :*rerun the experiment | ||
==Text Processing== | ==Text Processing== |
2016年1月19日 (二) 02:17的最后版本
目录
大标题文字
Speech Processing
AM development
Environment
End-to-End
- monophone ASR --Zhiyuan
- MPE
- CTC/nnet3/Kaldi
- more refered ruselts of Kaldi/CTC on 1400h-Chinese and plus 100h English
- Further test nnet3-ctc training on 4000h 8k & 10000h 16k dataset. Tune decoding configures (phone-lm-weight, acwt, blank-scale), but ctc is still about 5 percent worse than standard nnet3
- launched experiments of mpe after CTC on WSJ, code rivised
- experiments of "chain" on WSJ
- nnet3-ctc
- Reproduce TZY's experiment on 1400h+100h dataset--mengyuan
- Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan
- test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
- Finished nnet3-ctc alignment coding implementation--zhiyong
- ctc-mpe-1: codes considering CctcTransition done, waiting for the results
- ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement
Adapative learning rate method
- sequence training -Xiangyu
- write a technique report
Mic-Array
- hold
- compute EER with kaldi
Data selection unsupervised learning
- hold
- acoustic feature based submodular using Pingan dataset --zhiyong
- write code to speed up --zhiyong
- curriculum learning --zhiyong
RNN-DAE(Deep based Auto-Encode-RNN)
- hold
- RNN-DAE has worse performance than DNN-DAE because training dataset is small
- extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE
Speaker recognition
- DNN-ivector framework
- SUSR
- AutoEncoder + metric learning
- binary ivector
- Deep speaker embedding tasks
- For the max-margin metric learning task, make some additional experiments
- there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
- the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set.
- Two tricks: i-vector shuffle and speaker selection
- Pair-wised speaker vector training --> Data preparation
- Deep speaker embedding --> Data preparation
language vector
- write a paper--zhiyuan
- hold
- language vector is added to multi hidden layers--zhiyuan
- write code done
- check code
- http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=480
- RNN language vector
- hold
- language vector into multiple layers --Zhiyuan
- a Chinese paper
- speech rate into multiple layers --Zhiyuan
- verify the code for extra input(s) into DNN
multi-GPU
- multi-stream training --Sheng Su
- write a technique report
- kaldi-nnet3 --Xuewei
- 7*2048 8k 1400h tdnn training Xent done
- nnet3 mpe code is under investigation
- http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
- Analysed the MPE divergence problem when the context is 10. The reason may be ascribed to the Over-fitting. Large data-set may weaken the over-fitting phenomenon.
- RNN AM training on big dataset --mengyuan
- fix decode bug
- nnet3 lstm & blstm training on sinovoice 120h dataset, using KALDI's default config. but result is worse than tdnn
- Test nnet3-MPE code from Xuewei on sinovoice 120h 16k dataset. Didn't observe performance improvement. There are still some bugs in the code.
- Run nnet3-ctc training on sinovoice 120h 16k dataset. Result looks ok, but worse than normal model.
- Start nnet3-ctc training on sinovoice 4000h 8k dataset.
- train mpe --Zhiyong,Xuewei
- train nnet3 mpe using data from Jietong--Xuewei
- modify code to print stats --Xuewei
- The MPE does not work when the context is 10, need to further investigated--zhiyong
- The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong
- modify code to reduce memory
- Implement dark-knowledge on nnet3 and do some experiments
multi-task
- test according to selt-information neural structure learning --mengyuan
- hold
- write code done
- no significant performance improvement observed
- speech rate learning --xiangyu
- hold
- no significant performance improvement observed
- http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
- get results with extra input of speech rate info --Zhiyuan
30 chinese dataset
- revise syllable text
- add ome words in lexicon, which is applied to training and making graph.
- train and decode 30 chinese data again
- revise the techinique report
- prepare data
- kaldi recipe
- revise the code with the help of Teacher Wang
- rerun the experiment
Text Processing
Work
RNN Poem Process
- Combine addition rhyme.
- Investigate new method.
Document Represent
- Code done. Wait some experiments result.
Seq to Seq
- Work on some tasks.
Order representation
- Code some idea.
Balance Representation
- Investigate some papers.
- Current solution : Use knowledge or large corpus's similar pair.
Hold
Neural Based Document Classification
RNN Rank Task
Graph RNN
- Entity path embeded to entity.
- (hold)
RNN Word Segment
- Set bound to word segment.
- (hold)
Recommendation
- Reproduce baseline.
- LDA matrix dissovle.
- LDA (Text classification & Recommendation System) --> AAAI
RNN based QA
- Read Source Code.
- Attention based QA.
- Coding.
Text Group Intern Project
Buddhist Process
- (hold)
RNN Poem Process
- Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.
RNN Document Vector
- (hold)
Image Baseline
- Demo Release.
- Paper Report.
- Read CNN Paper.
Text Intuitive Idea
Trace Learning
- (Hold)
Match RNN
- (Hold)
financial group
model research
- RNN
- online model, update everyday
- modify cost function and learning method
- add more feature
rule combination
- GA method to optimize the model
basic rule
- classical tenth model
multiple-factor
- add more factor
- use sparse model
display
- bug fixed
- buy rule fixed
data
- data api
- download the future data and factor data