2014-03-14

Resoruce Building

Current text resource has been re-arranged and listed

AM development

Sparse DNN

Optimal Brain Damage(OBD).

GA-based block sparsity

Efficient DNN training

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?

Multi GPU training

Error encountered

GMM - DNN co-training

Initial DNN test done

tri4b - > DNN (org)
DNN alignmenment -> tri4b
tri4b alignment -> DNN (re-train)

  model/testcase              |  test_dev93(cv)       |     test_eval92
    --------------------------------------------------------------
    8400-80000(org)           |    7.41               |      4.13
    --------------------------------------------------------------
    re-train (Keep state #)   |    7.20               |      4.24
    --------------------------------------------------------------
    re-train (Free state #)   |    7.29               |      4.31
    --------------------------------------------------------------

Multilanguage training

Pure Chinese training reached 4.9%
Chinese + English reduced to 7.9%
English phone set should discriminate beginning phone and ending phone
Should set up multilingual network structure which shares low layers but separate languages at high layers

Noise training

Train with wsj database by corrupting data with various noise types

Almost all training conditions are completed
Interesting results in multi-conditional training (white + cafe) and test on park/station

AMR compression re-training

WeChat uses AMR compression method, which requires adaptation for our AM
Test AMR & non-AMR model

model			wav	amr

xent baseline		4.47	
wav_mpe baseline        4.20	36.77

amr_mpe_lr_1e-5		6.27	8.95
amr_mpe_lr_1e-4		7.58	8.68

amr_xEnt_lr_1e-5	6.89	7.99
amr_xEnt_lr_1e-4	6.61	7.28
amr_xEnt_lr_0.08	5.72	6.20

Prepare to do adaptation on 1700h
Prepare to do mixing xEnt test

GFbank

Finished the first round of gfbank training & test
The same gmm model (mfcc feature) was used to get the alignment
Traing fbank & gfbank based on the mfcc alignment
Clean training and noise test

		clean	5dB	10dB	15dB	20dB	25dB
gfbank		4.22	73.03	39.20	16.41	8.36	5.60
gfbank_80	4.36	74.41	42.94	18.13	8.59	5.85
fbank_zmy	3.97	74.78	44.57	18.80	8.54	5.30

gfbank + fbank 80 dim training/test

Engine optimization

Investigating LOUDS FST.

Word to Vector

Improved wordvector with multi sense

Almost impossible with the toolkit
Can think of pre-training vectors and then do clusering

WordVecteor-based keyword extraction

Prepared 7 category totally 500+ articles
A problem fixed to retrieve article words

Wordvector based on classification

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

boundary-involved char NNLM training done
Test on rescoring

Investigate MS RNN LM training

3T Sogou LM

3T + tencent LM combination:

Combine the 3T voc (11w) and the tencent 8w voca
re-segmentation
compute PPL with the 3T and tencent LM
compute the best mixing weights
the mixing weight is wrong ....
if we mix the two by equal weight (0.5/0.5), performance is better than the individual

3T + QA model combination

QA Matching

FST-based matching

Investigating why openfST union does not lead to a determinizable graph
Test the pattern label

TF/IDF weight

code is done, TF/IDF weight can be used right now.

Embedded development

CLG embedded decoder is almost done. Online compiler is on progress.
English scoring is under go

Speech QA

N-best with entity LM was analyzed

WER vs QA accuracy is done
The figure shows that WER and QA accuracy is positively related
Addding song names and singer names improve performance in most cases
There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA
The results on 媒体文件:Music_QA_wer.pdf

Class LM QA
Use QA LM as the baseine
Tag singer names and song names
build tag LM
Using graph integration to resolve the tags
Adjusting in-tag weight
Smaller weight produces more entity recognition
Check if the recognized songs/singers are correct/wrong

1, non-merge
    BaseLine:
           qa-singer-song
    songs       41
    singers     23

2, HCLG-merge
    Weight means the multiplier of the sub-graph entry. 
  (1) LM:1e-5
    weight  0.00000001  0.0001   0.001  0.01    1   10
    songs      20        20      21    19       9    4
    singers    13        13      13    13       2    2

2014-03-14

目录

Resoruce Building

AM development

Sparse DNN

Efficient DNN training

Multi GPU training

GMM - DNN co-training

Multilanguage training

Noise training

AMR compression re-training

GFbank

Engine optimization

Word to Vector

LM development

NN LM

3T Sogou LM

QA Matching

Embedded development

Speech QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具