Sinovoice-2016-5-19
来自cslt Wiki
目录
Data
- 16K LingYun
- 2000h data ready
- 4300h real-env data to label
- YueYu
- Total 250h(190h-YueYu + 60h-English)
- Add 60h YueYu
- CER: 75%->76%
- WeiYu
- 50h for training
- 120h labeled ready
- PingAn
- 100h User data done
Model training
Deletion Error Promblem
- Add one noise phone to alleviate the silence over-training
- Omit sil accuracy in discriminative training
- H smoothing of XEnt and MPE
- Add one silence arc from start-state to end-state
Big-Model Training
16k
8k=
Model
- Add noise phone
- 1300.mdl done
- CNN + TDNN
- 280/900 mdl
- Need about 12 days Xent
Project
- PingAn
- Add noise phone to phone-list
========================================================================================= | AM / config | all | KeHu wer || KeHu no-ins | ----------------------------------------------------------------------------------------- | tdnn 7-2048 xEnt | 16.45 | 36.49 || 25.18 | | tdnn 7-2048 MPE | 15.22 | 32.77 || 23.48 | | tdnn 7-2048 MPE adapt-PABX | 14.67 | 31.33 || 22.76 | ----------------------------------------------------------------------------------------- | tdnn 7-1024 xEnt | 16.60 | 35.91 || 25.58 | | tdnn 7-1024 MPE 2e-6 | 15.67 | 32.77 || 26.09 | | tdnn 7-1024 MPE 2e-5 1.mdl | 15.54 | 32.77 || 26.29 | | tdnn 7-1024 MPE 1e-5 4.mdl | 15.76 | 33.55 || 27.20 | | tdnn 7-1024 MPE adapt-PABX | 14.80 | 30.48 || 22.56 | ----------------------------------------------------------------------------------------- | spn 7-1024 xEnt | 16.49 | 36.23 || 24.59 | | spn 7-1024 xEnt xEnt-PA_user 101.mdl| 16.19 | 33.22 || 22.69 | | spn 7-1024 xEnt xEnt-PA_user mpe | 15.24 | 32.77 || 21.65 | | spn 7-1024 MPE-1000H 23.mdl | 15.29 | 33.09 || 21.65 | | spn 7-1024 MPE adapt-PA_all 29.mdl | 15.11 | 33.42 || 21.84 | | spn 7-1024 MPE adapt-PA_user 2e-5 | 15.31 | 31.79 || 20.14 | | spn 7-1024 MPE adapt-PA_user Hs 2e-5| 15.32 | 32.24 || 20.93 | =========================================================================================
Error detail: ============================================================================================ | AM / error | tot_err | ins | del | sub | wer |wer-no-ins| -------------------------------------------------------------------------------------------- | tdnn 7-1024 xEnt | 549 | 158 | 75 | 316 | 35.91 | 25.58 | | tdnn 7-1024 MPE | 501 | 102 | 140 | 259 | 32.77 | 26.09 | | tdnn 7-1024 MPE adapt PABX | 477 | 132 | 92 | 253 | 31.20 | 22.56 | -------------------------------------------------------------------------------------------- | spn 7-1024 xEnt | 554 | 178 | 66 | 310 | 36.23 | 24.59 | | spn 7-1024 MPE 1000H | 506 | 175 | 41 | 290 | 33.09 | 21.65 | | spn 7-1024 MPE adapt User | 486 | 178 | 43 | 265 | 31.79 | 20.14 | ============================================================================================
- LiaoNingYiDong:
================================================================= | AM / config | LNYD | ----------------------------------------------------------------- | tdnn 7-2048 xEnt | 21.51 | | tdnn 7-2048 MPE | 20.09 | | tdnn 7-2048 MPE adapt-LNYD | 17.92 | ----------------------------------------------------------------- | tdnn 7-1024 xEnt | 21.72 | | tdnn 7-1024 MPE | 20.99 | | tdnn 7-1024 MPE adapt-LNYD | | | cnn 7-1024 xEnt 500.mdl | 21.02 | ----------------------------------------------------------------- | spn 7-1024 xEnt | 21.70 | | spn 7-1024 MPE-1000H 23.mdl | 19.97 | | spn 7-1024 MPE adapt-LNYD | 18.67 | | spn cnn 7-1024 xEnt 300.mdl | 22.26 | =================================================================
Error detail: ================================================================================== | AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt | 5873 | 879 | 1364 | 3630 | 21.72 | | tdnn 7-1024 MPE | 5675 | 710 | 1839 | 3126 | 20.99 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt | 5866 | 955 | 1353 | 3558 | 21.70 | | spn 7-1024 MPE 1000H | 5398 | 992 | 937 | 3469 | 19.97 | ==================================================================================
Embedding
- The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
- 5*500-2400 TDNN model on training.
- no-svd model, MPE training done
- svd-100 model, MPE training 2/4 epochs finished.
LM=1e-5, beam=9, max-active=5000 ============================================================================================================= | AM / testset | test_1000ju | test_2000ju | test_8000ju | test_10000ju | ------------------------------------------------------------------------------------------------------------- | nnet1 4*600+800 xEnt (6.4M) | 25.30 | 40.48 | | | | nnet1 4*600+800 mpe (6.4M) | 20.75 | 35.33 | | | ------------------------------------------------------------------------------------------------------------- | nnet3 5*500 mpe (13M) | 16.18 | 29.53 | | | | nnet3 5*500 svd-100 mpe (9.5M) | 17.87 | 30.39 | | | =============================================================================================================
LM=1e-6, beam=9, max-active=5000 ============================================================================================================= | AM / testset | test_1000ju | test_2000ju | test_8000ju | test_10000ju | ------------------------------------------------------------------------------------------------------------- | nnet1 4*600+800 xEnt (6.4M) | 21.09 | 36.23 | | | | nnet1 4*600+800 mpe (6.4M) | 16.44 | 30.75 | | | ------------------------------------------------------------------------------------------------------------- | nnet3 5*500 mpe (13M) | 13.39 | 25.90 | | | | nnet3 5*500 svd-100 mpe (9.5M) | 14.49 | 26.55 | | | =============================================================================================================
Character LM
- Except Sogou-2T, 9-gram has been done.
- Add word boundary tag to Character-LM trainig done
- 9-gram
- Except Weibo & Sogou-2T
- 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
- 1e-8(54M) wer17.54
- Prepare specific domain vocabulary
- Dianxin/Baoxian/Dianli
- DT lm training
- ReFr
- Merge Character-LM & word-LM
- Union
- Compose, success.
- 2-step decoding: first, character-based LM. Then, word-based LM.
SiaSong Robot
- Beam-forming algorithm test
- NN-model based beam-forming
Project
- Pingan & Yueyu Deletion error too more
- TDNN deletion error rate > DNN deletion error rate
- TDNN Silence scale is too sensitive for different test cases.
SID
Digit
- Engine Package