2016年1月19日 (二) 02:17的最后版本

大标题文字

Speech Processing

AM development

Environment

End-to-End

monophone ASR --Zhiyuan

MPE
CTC/nnet3/Kaldi
more refered ruselts of Kaldi/CTC on 1400h-Chinese and plus 100h English
Further test nnet3-ctc training on 4000h 8k & 10000h 16k dataset. Tune decoding configures (phone-lm-weight, acwt, blank-scale), but ctc is still about 5 percent worse than standard nnet3
launched experiments of mpe after CTC on WSJ, code rivised
experiments of "chain" on WSJ

nnet3-ctc

Reproduce TZY's experiment on 1400h+100h dataset--mengyuan
Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan
test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
Finished nnet3-ctc alignment coding implementation--zhiyong
ctc-mpe-1: codes considering CctcTransition done, waiting for the results
ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement

Adapative learning rate method

sequence training -Xiangyu

write a technique report

Mic-Array

hold
compute EER with kaldi

Data selection unsupervised learning

hold
acoustic feature based submodular using Pingan dataset --zhiyong
write code to speed up --zhiyong
curriculum learning --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

hold
RNN-DAE has worse performance than DNN-DAE because training dataset is small
extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Speaker recognition

DNN-ivector framework
SUSR
AutoEncoder + metric learning
binary ivector
Deep speaker embedding tasks
For the max-margin metric learning task, make some additional experiments

there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set.
Two tricks: i-vector shuffle and speaker selection

Pair-wised speaker vector training --> Data preparation
Deep speaker embedding --> Data preparation

language vector

write a paper--zhiyuan

hold

language vector is added to multi hidden layers--zhiyuan

write code done
check code
http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=480

RNN language vector

hold

language vector into multiple layers --Zhiyuan

a Chinese paper

speech rate into multiple layers --Zhiyuan

verify the code for extra input(s) into DNN

multi-GPU

multi-stream training --Sheng Su

write a technique report

kaldi-nnet3 --Xuewei

7*2048 8k 1400h tdnn training Xent done
nnet3 mpe code is under investigation
http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
Analysed the MPE divergence problem when the context is 10. The reason may be ascribed to the Over-fitting. Large data-set may weaken the over-fitting phenomenon.

RNN AM training on big dataset --mengyuan

fix decode bug
nnet3 lstm & blstm training on sinovoice 120h dataset, using KALDI's default config. but result is worse than tdnn
Test nnet3-MPE code from Xuewei on sinovoice 120h 16k dataset. Didn't observe performance improvement. There are still some bugs in the code.
Run nnet3-ctc training on sinovoice 120h 16k dataset. Result looks ok, but worse than normal model.
Start nnet3-ctc training on sinovoice 4000h 8k dataset.

train mpe --Zhiyong,Xuewei

train nnet3 mpe using data from Jietong--Xuewei
modify code to print stats --Xuewei
The MPE does not work when the context is 10, need to further investigated--zhiyong
The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong
modify code to reduce memory

Implement dark-knowledge on nnet3 and do some experiments

multi-task

test according to selt-information neural structure learning --mengyuan

hold
write code done
no significant performance improvement observed

speech rate learning --xiangyu

hold
no significant performance improvement observed
http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483

get results with extra input of speech rate info --Zhiyuan

30 chinese dataset

revise syllable text
add ome words in lexicon, which is applied to training and making graph.
train and decode 30 chinese data again
revise the techinique report
prepare data
kaldi recipe

revise the code with the help of Teacher Wang
rerun the experiment

Text Processing

Work

RNN Poem Process

Combine addition rhyme.
Investigate new method.

Document Represent

Code done. Wait some experiments result.

Seq to Seq

Work on some tasks.

Order representation

Code some idea.

Balance Representation

Investigate some papers.
Current solution : Use knowledge or large corpus's similar pair.

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

Entity path embeded to entity.

(hold)

RNN Word Segment

Set bound to word segment.

(hold)

Recommendation

Reproduce baseline.

LDA matrix dissovle.
LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

Read Source Code.
Attention based QA.
Coding.

Text Group Intern Project

Buddhist Process

(hold)

RNN Poem Process

Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

(hold)

Image Baseline

Demo Release.
Paper Report.

Read CNN Paper.

Text Intuitive Idea

Trace Learning

(Hold)

Match RNN

(Hold)

financial group

model research

RNN

online model, update everyday
modify cost function and learning method
add more feature

rule combination

GA method to optimize the model

basic rule

classical tenth model

multiple-factor

add more factor
use sparse model

display

bug fixed

buy rule fixed

data

data api

download the future data and factor data

@@ 第18行： / 第18行： @@
 :* Reproduce TZY's experiment on 1400h+100h dataset--mengyuan
 :* Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan
+:* test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
+:* Finished nnet3-ctc alignment coding implementation--zhiyong
+:* ctc-mpe-1: codes considering CctcTransition done, waiting for the results
+:* ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement
 ====Adapative learning rate method====
@@ 第45行： / 第49行： @@
 * Deep speaker embedding tasks
 * For the max-margin metric learning task, make some additional experiments
-：*there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
+:*there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
+:*the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set.
+:*Two tricks: i-vector shuffle and speaker selection
+*Pair-wised speaker vector training --> Data preparation
+*Deep speaker embedding --> Data preparation
 ===language vector===
 * write a paper--zhiyuan
@@ 第80行： / 第89行： @@
 :* The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong
 :*modify code to reduce memory
+* Implement dark-knowledge on nnet3 and do some experiments
 ===multi-task===
@@ 第100行： / 第109行： @@
 *prepare data
 *kaldi recipe
+:*revise the code with the help of Teacher Wang
+:*rerun the experiment
 ==Text Processing==

“ASR:2016-01-18”版本间的差异

2016年1月19日 (二) 02:17的最后版本

目录

大标题文字

Speech Processing

AM development

Environment

End-to-End

Adapative learning rate method

Mic-Array

Data selection unsupervised learning

RNN-DAE(Deep based Auto-Encode-RNN)

Speaker recognition

language vector

multi-GPU

multi-task

30 chinese dataset

Text Processing

Work

RNN Poem Process

Document Represent

Seq to Seq

Order representation

Balance Representation

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

RNN Word Segment

Recommendation

RNN based QA

Text Group Intern Project

Buddhist Process

RNN Poem Process

RNN Document Vector

Image Baseline

Text Intuitive Idea

Trace Learning

Match RNN

financial group

model research

rule combination

basic rule

multiple-factor

display

data

导航菜单

搜索