“Sinovoice-2016-3-31”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Data== *16K LingYun :* 2000h data ready :* 4300h real-env data to label * YueYu :* Total 250h(190h-YueYu + 60h-English) :* CER: 75% * WeiYu :* 50h for training...”为内容创建页面)
 
Character LM
 
第39行: 第39行:
 
*Except Sogou-2T, 9-gram has been done.
 
*Except Sogou-2T, 9-gram has been done.
 
*Worse than word-lm(9%->6%)
 
*Worse than word-lm(9%->6%)
*Add word boundary to Character-LM trainig
+
*Add word boundary tag to Character-LM trainig
 
*Merge Character-LM  & word-LM
 
*Merge Character-LM  & word-LM
 
:* Union
 
:* Union

2016年3月31日 (四) 06:26的最后版本

Data

  • 16K LingYun
  • 2000h data ready
  • 4300h real-env data to label
  • YueYu
  • Total 250h(190h-YueYu + 60h-English)
  • CER: 75%
  • WeiYu
  • 50h for training
  • 120h labeled ready

Model training

Big-Model Training

  • 16k-10000h DNN training 7*1024-10000
  • MPE training done
  • Performance about 1 point worse than 7*2048-1000
  • The decoding speed is similar with old-nnet1 version
  • 7*2048-10000 net weight-matrix factoring, to improve the decoding speed --SVD


Embedding

  • 10000h-chain 5*400+800 DONE.
  • To confirm the beam affection on the performance

SinSong Robot

  • Test based on 10000h(7*2048-xent) model
 ------------------------------------------------
   condition | clean  | replay(0.5m) | real-env
 ------------------------------------------------
     wer     |   3    |  18(mpe-14)  | too-bad
 ------------------------------------------------
  • Recording

Character LM

  • Except Sogou-2T, 9-gram has been done.
  • Worse than word-lm(9%->6%)
  • Add word boundary tag to Character-LM trainig
  • Merge Character-LM & word-LM
  • Union
  • Compose

SID

Digit

  • Same Channel test EER: 100%
  • Speaker confirm
  • phone channel
  • Cross Channel
  • Mic-wav PLDA adaptation EER from 9% to 7% (20-30 persons)