2014-02-21

来自cslt Wiki

跳转至：导航、搜索

目录

1 Resoruce Building
2 AM development
3 Word to Vector
4 LM development
- 4.1 NN LM
- 4.2 3T Sogou LM
5 Embedded development
6 Speech QA

Resoruce Building

Current text resource has been re-arranged and listed

AM development

Sparse DNN

Optimal Brain Damage(OBD).

GA-based block sparsity

Efficient DNN training

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?

Multilanguage training

Pure Chinese training reached 4.9%
Chinese + English reduced to 7.9%
English phone set should discriminate beginning phone and ending phone
Should set up multilingual network structure which shares low layers but separate languages at high layers

Noise traing

Train with wsj database by corrupting data with various noise types

baseline system ready
noise data ready, selected 5 noise which is noise in reality
Liuchao's noise-adding toolkit ready

Engine optimization

Investigating LOUDS FST.

Adaptation

Tested adaptation performance with adapted utterances from 10 to 40.

Word to Vector

Test a training toolkit Standford University, which can involve global information into word2vector training

C++ implementation (instead of python) for data pre-processing, problem encountered

Basic wordvector plus global sense

Training 100M data (with global sense), memory overflow
Split the data into small pieces

Improved wordvector with multi sense

Prepare scripts

Keyword extraction based on wordvectors

Using google word vectors
Using k-mean to cluster

Investigating Senna toolkit from NEC. Intending to implement POS tagging based on word vectors.

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

3hours per iteration
For word-based NNLM, 1 hour/iteration for 1024 words, 4 hours/iteration for 10240 words
Performance lower than word-based NNLM

WordVector-based word and char NNLM training done

Google wordvecotr-based NNLM is worse than random initialized NNLM

3T Sogou LM

Naive training

all-word in lexicon
split into 9G text blocks
Merge one-by-one
Cutting to 110k lexicon
Test on QA
Performance reduced compared to Liurong's previous LM

Improved training

re-segmentation by Tencent 110k lexicon
re-train with 4G text blocks
sub-model training done, ready for merge based Tencent online1 test set.

Embedded development

CLG embedded decoder is almost done. Online compiler is on progress.
Zhiyong is working on layer-by-layer DNN training.

Speech QA

Current N-best results

N-best search plus pinyin correction
Total 2718 QA requests
default 1844 QA correct
no-entity 1650 QA correct
with-entity 1884 QA correct

Analyze error patterns for Nbest match

10.8% song transcriptions errors
18.3% English error
38.7% entity (song name, singer name) recognition lost
32.3% non-entity recognition error

Computing complexity

11000 entity has 23000 different pronunciations
Use tree to improve efficiency

Entity-class LM comparision

re-segmentation & re-train
SRILM class-based LM
Subgraph integration from Zhiyong

取自“http://index.cslt.org/mediawiki/index.php?title=2014-02-21&oldid=9210”