2014-05-23

来自cslt Wiki
跳转至: 导航搜索

Resoruce Building

  • Release management has been started
  • Blaster 0.1 & vivian 0.0 system release

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (++++)

Noise training

  • All experiments completed. Combing experiments.

GFbank

  • WSJ clean condition done. Obtained the same performance as the time domain implementation
  • Should experiment with the Tencent training set.

Multilingual ASR

  • Multilingual LM decoding
  • Fixing the non-tag bug

English model

RESULTS:
(state-gauss = 10000 100000)

1. Shujutang 100h chi-eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  23.86  |  20.95  |  20.90  |  20.84  |  20.81  |


2. Shujutang 100H chi-eng 8k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  26.27  |  23.63  |  23.14  |  22.93  |  23.00  |


3. voxforge pure eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  21.38  |  24.89  |  24.50  |  23.31  |  23.13  |


4. fisher pure eng 8k:
Not finish yet.
  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  40.65  |    -    |    -    |    -    |    -    |

  • Need to experiment with Gigabyptes LM
  • Need to check the AM settings and LM used in the Kaldi egs/fisher

Denoising & Farfield ASR

  • Baseline: close-talk model decode far-field speech: 92.65
  • Will investigate DAE model

Kaiser Window

  • Test on different numbers of Fbanks: no significant difference between 23/30/40 (both 8k/16k) [#223]
  • Test on Kaiser & Povey window: no significant difference obtained for both 8k/16k [#224, #225]

VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • click here
  • Need to test small scale network
  • 600-800 network
  • 100 X 4 + 2

Scoring

  • Fixing bug for the stream mode

LM development

Domain specific LM

  • English lexicon done, build HCLG
  • Re-build LM with the new lexicon
  • Tested on Dianxin dev set


NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training

QA

FST-based matching

  • Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
  • THRAX toolkit for grammar to FST
  • Investigate determinization of G embedding
  • Refer to Kaldi new code