Sinovoice-2016-3-31

Data

16K LingYun

2000h data ready
4300h real-env data to label

YueYu

Total 250h(190h-YueYu + 60h-English)
CER: 75%

WeiYu

50h for training
120h labeled ready

Model training

Big-Model Training

16k-10000h DNN training 7*1024-10000

MPE training done
Performance about 1 point worse than 7*2048-1000
The decoding speed is similar with old-nnet1 version
7*2048-10000 net weight-matrix factoring, to improve the decoding speed --SVD

Embedding

10000h-chain 5*400+800 DONE.

To confirm the beam affection on the performance

SinSong Robot

Test based on 10000h(7*2048-xent) model

 ------------------------------------------------
   condition | clean  | replay(0.5m) | real-env
 ------------------------------------------------
     wer     |   3    |  18(mpe-14)  | too-bad
 ------------------------------------------------

Recording

Character LM

Except Sogou-2T, 9-gram has been done.
Worse than word-lm(9%->6%)
Add word boundary tag to Character-LM trainig
Merge Character-LM & word-LM

Union
Compose

SID

Digit

Same Channel test EER: 100%

Speaker confirm
phone channel

Cross Channel

Mic-wav PLDA adaptation EER from 9% to 7% (20-30 persons)

Sinovoice-2016-3-31

目录

Data

Model training

Big-Model Training

Embedding

SinSong Robot

Character LM

SID

Digit

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具