“Sinovoice-2016-3-17”版本间的差异
来自cslt Wiki
(以“==Data== *16K LingYun :* 2000h data ready :* 4300h real-env data to label * YueYu :* Total 250h(190h-YueYu + 60h-English) :* CER: 75% * WeiYu :* 50h for training...”为内容创建页面) |
(没有差异)
|
2016年3月17日 (四) 07:32的版本
目录
Data
- 16K LingYun
- 2000h data ready
- 4300h real-env data to label
- YueYu
- Total 250h(190h-YueYu + 60h-English)
- CER: 75%
- WeiYu
- 50h for training
- 120h labeled ready
Big-Model Training
- 16k-10000h DNN training 7*1024-10000
- MPE training to be done
- Performance about 1 point worse than 7*2048-1000
- The decoding speed is similar with old-nnet1 version
- 7*2048-1000 net weight-matrix factoring, to improve the decoding speed --SVD
- TDNN+LSTM chain training
- Beam-13 performance, chain is better, but beam-9 worse.
- 8K
- HuaWei
- 5000h 7*1024-XENT be training, performance is similar to 7*2048
- To train MPE base on current XENT-model
Embedding
- 10000h-chain 5*400+800 DONE.
- To confirm the beam affection on the performance
SinSong Robot
- Test based on 10000h(7*2048-xent) model
------------------------------------------------ condition | clean | replay(0.5m) | real-env ------------------------------------------------ wer | 3 | 18(mpe-14) | too-bad ------------------------------------------------
- Recording
Character LM
- Except Sogou-2T, 9-gram has been done.
- Worse than word-lm(9%->6%)
- Merge Character-LM & word-LM
- Union
- Compose
Path Emphasize
- Script Done.
SID
Digit
- Same Channel test EER: 100%
- Speaker confirm
- phone channel
- Cross Channel
- Mic-wav PLDA adaptation EER from 9% to 7% (20-30 persons)