2015年12月7日 (一) 07:21的最后版本

Speech Processing

AM development

Environment

End-to-End

monophone ASR --Zhiyuan

MPE
CTC/nnet3/Kaldi

conditioning learning

language vector into multiple layers --Zhiyuan

a Chinese paper

speech rate into multiple layers --Zhiyuan

verify the code for extra input(s) into DNN

Adapative learning rate method

sequence training -Xiangyu

write a technique report

Mic-Array

hold
compute EER with kaldi

Data selection unsupervised learning

hold
acoustic feature based submodular using Pingan dataset --zhiyong
write code to speed up --zhiyong
curriculum learning --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

hold
RNN-DAE has worse performance than DNN-DAE because training dataset is small
extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Speaker recognition

DNN-ivector framework
SUSR
AutoEncoder + metric learning
binary ivector

language vector

write a paper--zhiyuan

hold

language vector is added to multi hidden layers--zhiyuan

write code done
check code
http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=480

RNN language vector

hold

multi-GPU

multi-stream training --Sheng Su

write a technique report

kaldi-nnet3 --Xuewei

7*2048 8k 1400h tdnn training Xent done
nnet3 mpe code is under investigation
http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472

train 7*2048 tdnn using 4000h data --Mengyuan
1700h+776h 16k nnet3 6*2000 training done, outperform 6776H_mpe model--Mengyuan
wrote nnet3 biglm-decoder for sinovoice.
train mpe using wsj and aurara4 --Zhiyong,Xuewei
train nnet3 mpe using data from Jietong--Xuewei

multi-task

test according to selt-information neural structure learning --mengyuan

hold
write code done
no significant performance improvement observed

speech rate learning --xiangyu

hold
no significant performance improvement observed
http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483

get results with extra input of speech rate info --Zhiyuan

Text Processing

Work

RNN Poem Process

Combine addition rhyme.
Investigate new method.

Document Represent

Code done. Wait some experiments result.

Seq to Seq

Work on some tasks.

Order representation

Code some idea.

Balance Representation

Investigate some papers.
Current solution : Use knowledge or large corpus's similar pair.

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

Entity path embeded to entity.

(hold)

RNN Word Segment

Set bound to word segment.

(hold)

Recommendation

Reproduce baseline.

LDA matrix dissovle.
LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

Read Source Code.
Attention based QA.
Coding.

Text Group Intern Project

Buddhist Process

(hold)

RNN Poem Process

Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

(hold)

Image Baseline

Demo Release.
Paper Report.

Read CNN Paper.

Text Intuitive Idea

Trace Learning

(Hold)

Match RNN

(Hold)

financial group

model research

RNN

online model, update everyday
modify cost function and learning method
add more feature

rule combination

GA method to optimize the model

basic rule

classical tenth model

multiple-factor

add more factor
use sparse model

display

bug fixed

buy rule fixed

data

data api

download the future data and factor data

@@ 第1行： / 第1行： @@
 ==Speech Processing ==
 === AM development ===
 ==== Environment ====
-==== RNN AM====
+==== End-to-End ====
-*train monophone RNN --zhiyuan
+*monophone ASR --Zhiyuan
-:* end to end MPE
+:* MPE
-:* end to end using nnet3
+:* CTC/nnet3/Kaldi
-:* http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=446
+==== conditioning learning ====
+* language vector into multiple layers --Zhiyuan
+:* a Chinese paper
+* speech rate into multiple layers --Zhiyuan
+:*verify the code for extra input(s) into DNN
 ====Adapative learning rate method====
 * sequence training -Xiangyu
 :* write a technique report
-:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=458
 ==== Mic-Array ====
@@ 第46行： / 第50行： @@
 * RNN language vector
 :*hold
-* train with extra input of speech rate info
 ===multi-GPU===
@@ 第56行： / 第60行： @@
 :*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
 * train 7*2048 tdnn using 4000h data --Mengyuan
+* 1700h+776h 16k nnet3 6*2000 training done, outperform 6776H_mpe model--Mengyuan
+* wrote nnet3 biglm-decoder for sinovoice.
 * train mpe using wsj and aurara4 --Zhiyong,Xuewei
+* train nnet3 mpe using data from Jietong--Xuewei
 ===multi-task===
@@ 第67行： / 第74行： @@
 :* no significant performance improvement observed
 :*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
-: test using extreme data
+: get results with extra input of speech rate info --Zhiyuan
 ==Text Processing==

“ASR:2015-12-1”版本间的差异

2015年12月7日 (一) 07:21的最后版本

目录

Speech Processing

AM development

Environment

End-to-End

conditioning learning

Adapative learning rate method

Mic-Array

Data selection unsupervised learning

RNN-DAE(Deep based Auto-Encode-RNN)

Speaker recognition

language vector

multi-GPU

multi-task

Text Processing

Work

RNN Poem Process

Document Represent

Seq to Seq

Order representation

Balance Representation

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

RNN Word Segment

Recommendation

RNN based QA

Text Group Intern Project

Buddhist Process

RNN Poem Process

RNN Document Vector

Image Baseline

Text Intuitive Idea

Trace Learning

Match RNN

financial group

model research

rule combination

basic rule

multiple-factor

display

data

导航菜单

搜索