2014-04-04

Resoruce Building

Current text resource has been re-arranged and listed

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity

Found a paper in 2000 with similar ideas.
Try to get a student working on high performance computing to do the optimization

Noise training

More experiments with no-noise
More experiments with additional noise types

AMR compression re-training

1700h MPE adaptation

iter1:

amr: %WER 13.40 [ 6398 / 47753, 252 ins, 829 del, 5317 sub ] wav: %WER 11.19 [ 5343 / 47753, 178 ins, 710 del, 4455 sub ]

iter2:

amr: %WER 13.31 [ 6358 / 47753, 255 ins, 798 del, 5305 sub ] wav: %WER 11.33 [ 5409 / 47753, 180 ins, 732 del, 4497 sub ]

iter3:

amr: %WER 13.25 [ 6326 / 47753, 230 ins, 823 del, 5273 sub ] wav: %WER 11.43 [ 5460 / 47753, 199 ins, 709 del, 4552 sub ]

iter4:

amr: %WER 13.17 [ 6289 / 47753, 225 ins, 833 del, 5231 sub ] wav: %WER 11.44 [ 5461 / 47753, 200 ins, 693 del, 4568 sub ]

iter5:

amr: %WER 13.17 [ 6291 / 47753, 254 ins, 769 del, 5268 sub ] wav: %WER 11.46 [ 5471 / 47753, 200 ins, 696 del, 4575 sub ]

GFbank

Found errors attributed to speaker2utterance

Denoising & Farfield ASR

First round of recording failed
Record farfield wave in next week

VAD

Source code prepared
Prepare DNN pipeline

Word to Vector

LDA baseline (sogou 1700*9 training set)

Training done
Training classifier

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

boundary-involved char NNLM training done
Word-boundary seems less important than char history

Investigate MS RNN LM training

Pronunciation scoring

8k model delivered
MLP-based scoring completed

QA

FST-based matching

Char FST on investigation
FST-based QA patent done

Speech QA

Class LM QA

excellent done
investigated various stepping-in weights, found negative weights (-1) is effective for encourage entity recognition
investigated performance reduction due to the preference on small words. Introduced a factor on L.fst to discourage short words.