2014-05-23

Resoruce Building

Release management has been started
Blaster 0.1 & vivian 0.0 system release

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity (++++)

Noise training

All experiments completed. Combing experiments.

GFbank

WSJ clean condition done. Obtained the same performance as the time domain implementation
Should experiment with the Tencent training set.

Multilingual ASR

Multilingual LM decoding
Fixing the non-tag bug

English model

RESULTS:
(state-gauss = 10000 100000)

1. Shujutang 100h chi-eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  23.86  |  20.95  |  20.90  |  20.84  |  20.81  |


2. Shujutang 100H chi-eng 8k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  26.27  |  23.63  |  23.14  |  22.93  |  23.00  |


3. voxforge pure eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  21.38  |  24.89  |  24.50  |  23.31  |  23.13  |


4. fisher pure eng 8k:
Not finish yet.
  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  40.65  |    -    |    -    |    -    |    -    |

Need to experiment with Gigabyptes LM
Need to check the AM settings and LM used in the Kaldi egs/fisher

Denoising & Farfield ASR

Baseline: close-talk model decode far-field speech: 92.65
Will investigate DAE model

Kaiser Window

Test on different numbers of Fbanks: no significant difference between 23/30/40 (both 8k/16k) [#223]
Test on Kaiser & Povey window: no significant difference obtained for both 8k/16k [#224, #225]

VAD

DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
click here
Need to test small scale network

600-800 network
100 X 4 + 2

Scoring

Fixing bug for the stream mode

LM development

Domain specific LM

English lexicon done, build HCLG
Re-build LM with the new lexicon
Tested on Dianxin dev set

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Inconsistent pattern in WER were found on Tenent test sets
probably need to use another test set to do investigation.

Investigate MS RNN LM training

QA

FST-based matching

Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
THRAX toolkit for grammar to FST

Investigate determinization of G embedding

Refer to Kaldi new code

2014-05-23

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

GFbank

Multilingual ASR

English model

Denoising & Farfield ASR

Kaiser Window

VAD

Scoring

LM development

Domain specific LM

NN LM

QA

FST-based matching

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具