“ASR:2015-10-19”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Speech Processing == === AM development === ==== Environment ==== * repair laptop ==== RNN AM==== *train monophone RNN --zhiyuan :* decode using 5-gram :* the t...”为内容创建页面)
 
Speech Processing
 
(相同用户的6个中间修订版本未显示)
第3行: 第3行:
  
 
==== Environment ====
 
==== Environment ====
* repair laptop
+
* in disaster
 
+
  
 
==== RNN AM====
 
==== RNN AM====
 
*train monophone RNN --zhiyuan
 
*train monophone RNN --zhiyuan
:* decode using 5-gram
+
:* end to end MPE
:* the train method of batch 
+
:* http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=446
:* test using another test set
+
 
* train RNN MPE using large dataset--mengyuan
 
* train RNN MPE using large dataset--mengyuan
:* diverge problem
+
:*hold
:* try adaptation method
+
:* better mpe result observed ,unknown errors in previous lstm mpe compiling kaldi
 +
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=403
  
 
+
====Adapative learning rate method====
====Learning rate tunning====
+
 
* sequence training -Xiangyu
 
* sequence training -Xiangyu
 +
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=458
  
 
==== Mic-Array ====
 
==== Mic-Array ====
第30行: 第29行:
  
 
====RNN-DAE(Deep based Auto-Encode-RNN)====
 
====RNN-DAE(Deep based Auto-Encode-RNN)====
 +
* hold
 
* RNN-DAE has worse performance than DNN-DAE because training dataset is small  
 
* RNN-DAE has worse performance than DNN-DAE because training dataset is small  
 
* extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE   
 
* extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE   
  
 
===Ivector&Dvector based ASR===  
 
===Ivector&Dvector based ASR===  
* dark knowledge
+
* learning from ivector --Lantian
:* has much worse performance than baseline (EER: base 29%  dark knowledge 48%)
+
:* CNN ivector learning
 +
:* DNN ivector learning
 
* binary ivector  
 
* binary ivector  
* metric learning  
+
* metric learning
 +
* LDA-vector Transfer Learning
 +
* write a technique report
  
 
===language vector===
 
===language vector===
* hold
 
 
* write a paper--zhiyuan
 
* write a paper--zhiyuan
 +
:*hold
 
* language vector is added to multi hidden layers--zhiyuan  
 
* language vector is added to multi hidden layers--zhiyuan  
 +
:* write code done
 +
:* check code
 +
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=480
 
* RNN language vector
 
* RNN language vector
 
:*hold
 
:*hold
第48行: 第54行:
 
===multi-GPU===
 
===multi-GPU===
 
* multi-stream training --Sheng Su
 
* multi-stream training --Sheng Su
:*two GPUs work well, but four GPUs divergent
+
:* write a technique report
* solve the problem of buffer-- Sheng Su
+
 
* kaldi-nnet3 --Xuewei
 
* kaldi-nnet3 --Xuewei
 +
:* 7*2048 8k 1400h tdnn training Xent done
 +
:* nnet3 mpe code is under investigation
 +
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
 +
* train 7*2048 tdnn using 4000h data --Mengyuan
  
 
===multi-task===
 
===multi-task===
* write code according to selt-information neural structure learning --mengyuan
+
* test according to selt-information neural structure learning --mengyuan
 +
:* hold
 +
:* write code done
 +
:* no significant performance improvement observed
 
* speech rate learning --xiangyu
 
* speech rate learning --xiangyu
 
+
:* no significant performance improvement observed
===Neutral picture style transfer==
+
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
*hold
+
: test using extreme data
* reproduced the result of the paper "A neutral algorithm of artistic style" --Zhiyuan, Xuewei
+
* while subject to the GPU's memory, limited to inception net with sgd optimizer (VGG network with the default L-BFGS optimizer consumes very much memory, which is better)
+
 
+
===Multi-task learning===
+
* train model using speech rate --xiangyu
+
* speech recognition plus speaker reconition --xiangyu,lantian,zhiyuan
+
  
 
==Text Processing==
 
==Text Processing==

2015年11月16日 (一) 07:27的最后版本

Speech Processing

AM development

Environment

  • in disaster

RNN AM

  • train monophone RNN --zhiyuan
  • train RNN MPE using large dataset--mengyuan

Adapative learning rate method

  • sequence training -Xiangyu

Mic-Array

  • hold
  • compute EER with kaldi

Data selection unsupervised learning

  • hold
  • acoustic feature based submodular using Pingan dataset --zhiyong
  • write code to speed up --zhiyong
  • curriculum learning --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

  • hold
  • RNN-DAE has worse performance than DNN-DAE because training dataset is small
  • extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Ivector&Dvector based ASR

  • learning from ivector --Lantian
  • CNN ivector learning
  • DNN ivector learning
  • binary ivector
  • metric learning
  • LDA-vector Transfer Learning
  • write a technique report

language vector

  • write a paper--zhiyuan
  • hold
  • language vector is added to multi hidden layers--zhiyuan
  • RNN language vector
  • hold

multi-GPU

  • multi-stream training --Sheng Su
  • write a technique report
  • kaldi-nnet3 --Xuewei
  • train 7*2048 tdnn using 4000h data --Mengyuan

multi-task

  • test according to selt-information neural structure learning --mengyuan
  • hold
  • write code done
  • no significant performance improvement observed
  • speech rate learning --xiangyu
test using extreme data

Text Processing

RNN LM

  • character-lm rnn(hold)
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)

Neural Based Document Classification

  • (hold)

RNN Rank Task

  • Test.
  • Paper: RNN Rank Net.
  • (hold)
  • Output rank information.

Graph RNN

  • Entity path embeded to entity.
  • (hold)

RNN Word Segment

  • Set bound to word segment.
  • (hold)

Seq to Seq(09-15)

  • Review papers.
  • Reproduce baseline. (08-03 <--> 08-17)

Order representation

  • Nested Dropout
  • semi-linear --> neural based auto-encoder.
  • modify the objective function(hold)

Balance Representation

  • Find error signal

Recommendation

  • Reproduce baseline.
  • LDA matrix dissovle.
  • LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

  • Read Source Code.
  • Attention based QA.
  • Coding.

RNN Poem Process

  • Seq based BP.
  • (hold)

Text Group Intern Project

Buddhist Process

  • (hold)

RNN Poem Process

  • Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

  • (hold)

Image Baseline

  • Demo Release.
  • Paper Report.
  • Read CNN Paper.

Text Intuitive Idea

Trace Learning

  • (Hold)

Match RNN

  • (Hold)

financial group

model research

  • RNN
  • online model, update everyday
  • modify cost function and learning method
  • add more feature

rule combination

  • GA method to optimize the model

basic rule

  • classical tenth model

multiple-factor

  • add more factor
  • use sparse model

display

  • bug fixed
  • buy rule fixed

data

  • data api
  • download the future data and factor data