ASR:2016-01-18

来自cslt Wiki
2016年1月19日 (二) 02:17Zxw讨论 | 贡献的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

大标题文字

Speech Processing

AM development

Environment

End-to-End

  • monophone ASR --Zhiyuan
  • MPE
  • CTC/nnet3/Kaldi
  • more refered ruselts of Kaldi/CTC on 1400h-Chinese and plus 100h English
  • Further test nnet3-ctc training on 4000h 8k & 10000h 16k dataset. Tune decoding configures (phone-lm-weight, acwt, blank-scale), but ctc is still about 5 percent worse than standard nnet3
  • launched experiments of mpe after CTC on WSJ, code rivised
  • experiments of "chain" on WSJ
  • nnet3-ctc
  • Reproduce TZY's experiment on 1400h+100h dataset--mengyuan
  • Compare performance of nnet3-xEnt and nnet3-ctc on 100h dataset--mengyuan
  • test pre-training based CTC training. a little better than CTC-from-scratch but still worse than xEnt--mengyuan
  • Finished nnet3-ctc alignment coding implementation--zhiyong
  • ctc-mpe-1: codes considering CctcTransition done, waiting for the results
  • ctc-mpe-2: results considering Transition instead of CctcTransition after 1 iteration show improvement

Adapative learning rate method

  • sequence training -Xiangyu
  • write a technique report

Mic-Array

  • hold
  • compute EER with kaldi

Data selection unsupervised learning

  • hold
  • acoustic feature based submodular using Pingan dataset --zhiyong
  • write code to speed up --zhiyong
  • curriculum learning --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

  • hold
  • RNN-DAE has worse performance than DNN-DAE because training dataset is small
  • extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Speaker recognition

  • DNN-ivector framework
  • SUSR
  • AutoEncoder + metric learning
  • binary ivector
  • Deep speaker embedding tasks
  • For the max-margin metric learning task, make some additional experiments
  • there is something wrong on the the NIST-SRE05 label (wav.scp, spk2utt, utt2spk),and the SRE05 has been re-labelled
  • the Switchboard-1-LDC2001S13 and Switchboard-Cell-P2-LDC2004S07 have been used as the new Dev.set.
  • Two tricks: i-vector shuffle and speaker selection
  • Pair-wised speaker vector training --> Data preparation
  • Deep speaker embedding --> Data preparation

language vector

  • write a paper--zhiyuan
  • hold
  • language vector is added to multi hidden layers--zhiyuan
  • RNN language vector
  • hold
  • language vector into multiple layers --Zhiyuan
  • a Chinese paper
  • speech rate into multiple layers --Zhiyuan
  • verify the code for extra input(s) into DNN

multi-GPU

  • multi-stream training --Sheng Su
  • write a technique report
  • kaldi-nnet3 --Xuewei
  • RNN AM training on big dataset --mengyuan
  • fix decode bug
  • nnet3 lstm & blstm training on sinovoice 120h dataset, using KALDI's default config. but result is worse than tdnn
  • Test nnet3-MPE code from Xuewei on sinovoice 120h 16k dataset. Didn't observe performance improvement. There are still some bugs in the code.
  • Run nnet3-ctc training on sinovoice 120h 16k dataset. Result looks ok, but worse than normal model.
  • Start nnet3-ctc training on sinovoice 4000h 8k dataset.
  • train mpe --Zhiyong,Xuewei
  • train nnet3 mpe using data from Jietong--Xuewei
  • modify code to print stats --Xuewei
  • The MPE does not work when the context is 10, need to further investigated--zhiyong
  • The nnet1 MPE we test is also based on context 5, may larger context is an inherent problem--zhiyong
  • modify code to reduce memory
  • Implement dark-knowledge on nnet3 and do some experiments

multi-task

  • test according to selt-information neural structure learning --mengyuan
  • hold
  • write code done
  • no significant performance improvement observed
  • speech rate learning --xiangyu
get results with extra input of speech rate info --Zhiyuan

30 chinese dataset

  • revise syllable text
  • add ome words in lexicon, which is applied to training and making graph.
  • train and decode 30 chinese data again
  • revise the techinique report
  • prepare data
  • kaldi recipe
  • revise the code with the help of Teacher Wang
  • rerun the experiment

Text Processing

Work

RNN Poem Process

  • Combine addition rhyme.
  • Investigate new method.

Document Represent

  • Code done. Wait some experiments result.

Seq to Seq

  • Work on some tasks.

Order representation

  • Code some idea.

Balance Representation

  • Investigate some papers.
  • Current solution : Use knowledge or large corpus's similar pair.

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

  • Entity path embeded to entity.
  • (hold)

RNN Word Segment

  • Set bound to word segment.
  • (hold)

Recommendation

  • Reproduce baseline.
  • LDA matrix dissovle.
  • LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

  • Read Source Code.
  • Attention based QA.
  • Coding.

Text Group Intern Project

Buddhist Process

  • (hold)

RNN Poem Process

  • Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

  • (hold)

Image Baseline

  • Demo Release.
  • Paper Report.
  • Read CNN Paper.

Text Intuitive Idea

Trace Learning

  • (Hold)

Match RNN

  • (Hold)

financial group

model research

  • RNN
  • online model, update everyday
  • modify cost function and learning method
  • add more feature

rule combination

  • GA method to optimize the model

basic rule

  • classical tenth model

multiple-factor

  • add more factor
  • use sparse model

display

  • bug fixed
  • buy rule fixed

data

  • data api
  • download the future data and factor data