“ASR:2016-01-18”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
End-to-End
Zxw讨论 | 贡献
30 chinese dataset
 
(相同用户的2个中间修订版本未显示)
第20行: 第20行:
 
:* test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
 
:* test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
 
:* Finished nnet3-ctc alignment coding implementation--zhiyong
 
:* Finished nnet3-ctc alignment coding implementation--zhiyong
 +
:* ctc-mpe-1: codes considering CctcTransition done, waiting for the results
 +
:* ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement
  
 
====Adapative learning rate method====
 
====Adapative learning rate method====
第47行: 第49行:
 
* Deep speaker embedding tasks
 
* Deep speaker embedding tasks
 
* For the max-margin metric learning task, make some additional experiments
 
* For the max-margin metric learning task, make some additional experiments
*there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
+
:*there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
 +
:*the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set.
 +
:*Two tricks: i-vector shuffle and speaker selection
 +
*Pair-wised speaker vector training --> Data preparation
 +
*Deep speaker embedding --> Data preparation
 +
 
 
===language vector===
 
===language vector===
 
* write a paper--zhiyuan
 
* write a paper--zhiyuan
第102行: 第109行:
 
*prepare data
 
*prepare data
 
*kaldi recipe
 
*kaldi recipe
 +
:*revise the code with the help of Teacher Wang
 +
:*rerun the experiment
  
 
==Text Processing==
 
==Text Processing==

2016年1月19日 (二) 02:17的最后版本

大标题文字

Speech Processing

AM development

Environment

End-to-End

  • monophone ASR --Zhiyuan
  • MPE
  • CTC/nnet3/Kaldi
  • more refered ruselts of Kaldi/CTC on 1400h-Chinese and plus 100h English
  • Further test nnet3-ctc training on 4000h 8k & 10000h 16k dataset. Tune decoding configures (phone-lm-weight, acwt, blank-scale), but ctc is still about 5 percent worse than standard nnet3
  • launched experiments of mpe after CTC on WSJ, code rivised
  • experiments of "chain" on WSJ
  • nnet3-ctc
  • Reproduce TZY's experiment on 1400h+100h dataset--mengyuan
  • Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan
  • test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
  • Finished nnet3-ctc alignment coding implementation--zhiyong
  • ctc-mpe-1: codes considering CctcTransition done, waiting for the results
  • ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement

Adapative learning rate method

  • sequence training -Xiangyu
  • write a technique report

Mic-Array

  • hold
  • compute EER with kaldi

Data selection unsupervised learning

  • hold
  • acoustic feature based submodular using Pingan dataset --zhiyong
  • write code to speed up --zhiyong
  • curriculum learning --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

  • hold
  • RNN-DAE has worse performance than DNN-DAE because training dataset is small
  • extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Speaker recognition

  • DNN-ivector framework
  • SUSR
  • AutoEncoder + metric learning
  • binary ivector
  • Deep speaker embedding tasks
  • For the max-margin metric learning task, make some additional experiments
  • there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
  • the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set.
  • Two tricks: i-vector shuffle and speaker selection
  • Pair-wised speaker vector training --> Data preparation
  • Deep speaker embedding --> Data preparation

language vector

  • write a paper--zhiyuan
  • hold
  • language vector is added to multi hidden layers--zhiyuan
  • RNN language vector
  • hold
  • language vector into multiple layers --Zhiyuan
  • a Chinese paper
  • speech rate into multiple layers --Zhiyuan
  • verify the code for extra input(s) into DNN

multi-GPU

  • multi-stream training --Sheng Su
  • write a technique report
  • kaldi-nnet3 --Xuewei
  • RNN AM training on big dataset --mengyuan
  • fix decode bug
  • nnet3 lstm & blstm training on sinovoice 120h dataset, using KALDI's default config. but result is worse than tdnn
  • Test nnet3-MPE code from Xuewei on sinovoice 120h 16k dataset. Didn't observe performance improvement. There are still some bugs in the code.
  • Run nnet3-ctc training on sinovoice 120h 16k dataset. Result looks ok, but worse than normal model.
  • Start nnet3-ctc training on sinovoice 4000h 8k dataset.
  • train mpe --Zhiyong,Xuewei
  • train nnet3 mpe using data from Jietong--Xuewei
  • modify code to print stats --Xuewei
  • The MPE does not work when the context is 10, need to further investigated--zhiyong
  • The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong
  • modify code to reduce memory
  • Implement dark-knowledge on nnet3 and do some experiments

multi-task

  • test according to selt-information neural structure learning --mengyuan
  • hold
  • write code done
  • no significant performance improvement observed
  • speech rate learning --xiangyu
get results with extra input of speech rate info --Zhiyuan

30 chinese dataset

  • revise syllable text
  • add ome words in lexicon, which is applied to training and making graph.
  • train and decode 30 chinese data again
  • revise the techinique report
  • prepare data
  • kaldi recipe
  • revise the code with the help of Teacher Wang
  • rerun the experiment

Text Processing

Work

RNN Poem Process

  • Combine addition rhyme.
  • Investigate new method.

Document Represent

  • Code done. Wait some experiments result.

Seq to Seq

  • Work on some tasks.

Order representation

  • Code some idea.

Balance Representation

  • Investigate some papers.
  • Current solution : Use knowledge or large corpus's similar pair.

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

  • Entity path embeded to entity.
  • (hold)

RNN Word Segment

  • Set bound to word segment.
  • (hold)

Recommendation

  • Reproduce baseline.
  • LDA matrix dissovle.
  • LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

  • Read Source Code.
  • Attention based QA.
  • Coding.

Text Group Intern Project

Buddhist Process

  • (hold)

RNN Poem Process

  • Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

  • (hold)

Image Baseline

  • Demo Release.
  • Paper Report.
  • Read CNN Paper.

Text Intuitive Idea

Trace Learning

  • (Hold)

Match RNN

  • (Hold)

financial group

model research

  • RNN
  • online model, update everyday
  • modify cost function and learning method
  • add more feature

rule combination

  • GA method to optimize the model

basic rule

  • classical tenth model

multiple-factor

  • add more factor
  • use sparse model

display

  • bug fixed
  • buy rule fixed

data

  • data api
  • download the future data and factor data