Sinovoice-2016-3-31
来自cslt Wiki
目录
Data
- 16K LingYun
- 2000h data ready
- 4300h real-env data to label
- YueYu
- Total 250h(190h-YueYu + 60h-English)
- CER: 75%
- WeiYu
- 50h for training
- 120h labeled ready
Model training
Big-Model Training
- 16k-10000h DNN training 7*1024-10000
- MPE training done
- Performance about 1 point worse than 7*2048-1000
- The decoding speed is similar with old-nnet1 version
- 7*2048-10000 net weight-matrix factoring, to improve the decoding speed --SVD
Embedding
- 10000h-chain 5*400+800 DONE.
- To confirm the beam affection on the performance
SinSong Robot
- Test based on 10000h(7*2048-xent) model
------------------------------------------------ condition | clean | replay(0.5m) | real-env ------------------------------------------------ wer | 3 | 18(mpe-14) | too-bad ------------------------------------------------
- Recording
Character LM
- Except Sogou-2T, 9-gram has been done.
- Worse than word-lm(9%->6%)
- Add word boundary tag to Character-LM trainig
- Merge Character-LM & word-LM
- Union
- Compose
SID
Digit
- Same Channel test EER: 100%
- Speaker confirm
- phone channel
- Cross Channel
- Mic-wav PLDA adaptation EER from 9% to 7% (20-30 persons)