Sinovoice-2016-5-12

来自cslt Wiki
跳转至: 导航搜索

Data

  • 16K LingYun
  • 2000h data ready
  • 4300h real-env data to label
  • YueYu
  • Total 250h(190h-YueYu + 60h-English)
  • Add 60h YueYu
  • CER: 75%->76%
  • WeiYu
  • 50h for training
  • 120h labeled ready
  • PingAn
  • 100h User data done

Model training

Deletion Error Promblem

  • Add one noise phone to alleviate the silence over-training
  • Omit sil accuracy in discriminative training
  • H smoothing of XEnt and MPE
  • Add one silence arc from start-state to end-state

Big-Model Training

16k

8k=

Model
  • Add noise phone
  • 1300.mdl done
  • CNN + TDNN
  • 280/900 mdl
  • Need about 12 days Xent
Project
  • PingAn
  • Add noise phone to phone-list
PingAnAll:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |  3626   |   619   |   773   |   2234  |  16.60  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 1300.mdl   |  3746   |   702   |   763   |   2281  |  16.xx  |
 ==================================================================================
PingAnUser:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |   549   |   158   |    75   |   316   |  35.91  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 1300.mdl   |   571   |   151   |    97   |   323   |  35.xx  |
 ==================================================================================
  • LiaoNingYiDong


======================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |   5873  |   879   |   1364  |   3630  |  21.72  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 300.mdl    |   6257  |   977   |   1348  |   3923  |  23.14  |
 ==================================================================================

Embedding

  • The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
  • 5*576-2400 TDNN model training done. AM size is about 17M
  • 5*500-2400 TDNN model on training.
  • MPE training 2/4 part
  • test8000:wer16, test10000:wer30

Character LM

  • Except Sogou-2T, 9-gram has been done.
  • Add word boundary tag to Character-LM trainig done
  • 9-gram
  • Except Weibo & Sogou-2T
  • 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
  • 1e-8(54M) wer17.54
  • Prepare specific domain vocabulary
  • Dianxin/Baoxian/Dianli
  • DT lm training
  • ReFr
  • Merge Character-LM & word-LM
  • Union
  • Compose, success.
  • 2-step decoding: first, character-based LM. Then, word-based LM.

SiaSong Robot

  • Beam-forming algorithm test
  • NN-model based beam-forming

Project

  • Pingan & Yueyu Deletion error too more
  • TDNN deletion error rate > DNN deletion error rate
  • TDNN Silence scale is too sensitive for different test cases.

SID

Digit

  • Engine Package