2014-08-01

Resoruce Building

Leftover questions

Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training
NN LM

AM development

Sparse DNN

WJS sparse DNN shows a slightly better than non-sparse cases when the network is in a large scale
Pre-training does work for DNN training (for both 4/5/6 layers)

Noise training

Journal paper writing on going

Multilingual ASR

Native English speaker + Chinglish speaker obtained better performance.

Drop out & convolutional network

Change learning to 0.001, the training process can be started.
Frame Accuracy goes to : (with/without drop probability normalization)

Denoising & Farfield ASR

By tuning parameters of late-response lag & response time, obtained performance improvement with Lasso.

Simulation results: Baseline:

               model/test              |  far_evl92   |   near_evl92

                  clean_ce             |     59.38    |    19.25
                mpe_clean_ce           |     40.46    |    12.94

Lasso with optimal parameters(lambda=0.05, delta=5, N=10)

               model/test              |  far_evl92   |   near_evl92

               clean_ce                |     54.63    |    15.75

             mpe_clean_ce              |     36.58    |    11.64

Real data results:

                  model/test              |  far_evl92   |   near_evl92
--------------------------------------------------------------------------
                  clean_ce                |     94.86    |    63.48

                mpe_clean_ce              |     92.29    |    58.37

dereverberated recording :

                  model/test              |  far_evl92   |   near_evl92
--------------------------------------------------------------------------
                  clean_ce                |     94.91    |    61.03

                mpe_clean_ce              |     91.28    |    54.16

Adaptation under running

VAD

Waiting for testing results

Scoring

Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.

Confidence

Be familiar with Kaldi
Need to extract lattice and DNN features

Embedded decoder

Chatting LM released (80k)
Train two smaller network: 500x4+600, 400x4+500: on going
Need to upload the new client code onto git (+)
Build a new graph with MPE3 am and chatting LM.

LM development

Domain specific LM

h2. Domain specific LM construction

h3. TAG LM

TAG obtained better performance

h3. Chatting LM

First version released (80k lexicon)
Prepare 2nd released (120k lexicon)
Test on Xiaotang long

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.

Speaker ID

Full-data SRE trial goes into the final stage
results will be ready soon

Translation

collecting more data (Xinhua parallel text, bible, name entity) for the second version
check possible parameters to control phrase pair lexicon

2014-08-01

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

Multilingual ASR

Drop out & convolutional network

Denoising & Farfield ASR

VAD

Scoring

Confidence

Embedded decoder

LM development

Domain specific LM

Word2Vector

W2V based doc classification

Speaker ID

Translation

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具