“Sinovoice-2016-5-12”版本间的差异
来自cslt Wiki
(以“==Data== *16K LingYun :* 2000h data ready :* 4300h real-env data to label * YueYu :* Total 250h(190h-YueYu + 60h-English) :* Add 60h YueYu :* CER: 75%->76% * WeiY...”为内容创建页面) |
|||
第13行: | 第13行: | ||
:* 50h for training | :* 50h for training | ||
:* 120h labeled ready | :* 120h labeled ready | ||
+ | |||
+ | * PingAn | ||
+ | :*100h User data done | ||
==Model training== | ==Model training== | ||
第19行: | 第22行: | ||
* Omit sil accuracy in discriminative training | * Omit sil accuracy in discriminative training | ||
* H smoothing of XEnt and MPE | * H smoothing of XEnt and MPE | ||
+ | * Add one silence arc from start-state to end-state | ||
− | + | ===Big-Model Training=== | |
− | + | ====16k==== | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ====8k===== | |
− | + | =====Model===== | |
− | + | * Add noise phone | |
− | + | :* 1300.mdl done | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | :* | + | * CNN + TDNN |
− | + | :* 280/900 mdl | |
− | + | :* Need about 12 days Xent | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | :* | + | =====Project===== |
− | + | * PingAn | |
− | + | :* Add noise phone to phone-list | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
PingAnAll: | PingAnAll: | ||
================================================================================== | ================================================================================== | ||
第80行: | 第46行: | ||
| tdnn 7-1024 xEnt 2500.mdl | 3626 | 619 | 773 | 2234 | 16.60 | | | tdnn 7-1024 xEnt 2500.mdl | 3626 | 619 | 773 | 2234 | 16.60 | | ||
---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | ||
− | | spn 7-1024 xEnt | + | | spn 7-1024 xEnt 1300.mdl | 3746 | 702 | 763 | 2281 | 16.xx | |
================================================================================== | ================================================================================== | ||
第89行: | 第55行: | ||
| tdnn 7-1024 xEnt 2500.mdl | 549 | 158 | 75 | 316 | 35.91 | | | tdnn 7-1024 xEnt 2500.mdl | 549 | 158 | 75 | 316 | 35.91 | | ||
---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | ||
− | | spn 7-1024 xEnt | + | | spn 7-1024 xEnt 1300.mdl | 571 | 151 | 97 | 323 | 35.xx | |
================================================================================== | ================================================================================== | ||
− | + | *LiaoNingYiDong | |
− | ================================================================================== | + | |
+ | |||
+ | ================================================================================== | ||
| AM / error | tot_err | ins | del | sub | wer | | | AM / error | tot_err | ins | del | sub | wer | | ||
---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | ||
第105行: | 第73行: | ||
* 5*576-2400 TDNN model training done. AM size is about 17M | * 5*576-2400 TDNN model training done. AM size is about 17M | ||
* 5*500-2400 TDNN model on training. | * 5*500-2400 TDNN model on training. | ||
− | * | + | * MPE training 2/4 part |
+ | :* test8000:wer16, test10000:wer30 | ||
===Character LM=== | ===Character LM=== | ||
第112行: | 第81行: | ||
:* 9-gram | :* 9-gram | ||
:* Except Weibo & Sogou-2T | :* Except Weibo & Sogou-2T | ||
+ | :* 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4 | ||
+ | :* 1e-8(54M) wer17.54 | ||
+ | |||
* Prepare specific domain vocabulary | * Prepare specific domain vocabulary | ||
:* Dianxin/Baoxian/Dianli | :* Dianxin/Baoxian/Dianli | ||
*DT lm training | *DT lm training | ||
+ | :* ReFr | ||
+ | |||
*Merge Character-LM & word-LM | *Merge Character-LM & word-LM | ||
:* Union | :* Union |
2016年5月12日 (四) 07:31的最后版本
目录
Data
- 16K LingYun
- 2000h data ready
- 4300h real-env data to label
- YueYu
- Total 250h(190h-YueYu + 60h-English)
- Add 60h YueYu
- CER: 75%->76%
- WeiYu
- 50h for training
- 120h labeled ready
- PingAn
- 100h User data done
Model training
Deletion Error Promblem
- Add one noise phone to alleviate the silence over-training
- Omit sil accuracy in discriminative training
- H smoothing of XEnt and MPE
- Add one silence arc from start-state to end-state
Big-Model Training
16k
8k=
Model
- Add noise phone
- 1300.mdl done
- CNN + TDNN
- 280/900 mdl
- Need about 12 days Xent
Project
- PingAn
- Add noise phone to phone-list
PingAnAll: ================================================================================== | AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt 2500.mdl | 3626 | 619 | 773 | 2234 | 16.60 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt 1300.mdl | 3746 | 702 | 763 | 2281 | 16.xx | ==================================================================================
PingAnUser: ================================================================================== | AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt 2500.mdl | 549 | 158 | 75 | 316 | 35.91 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt 1300.mdl | 571 | 151 | 97 | 323 | 35.xx | ==================================================================================
- LiaoNingYiDong
======================================================================
| AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt 2500.mdl | 5873 | 879 | 1364 | 3630 | 21.72 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt 300.mdl | 6257 | 977 | 1348 | 3923 | 23.14 | ==================================================================================
Embedding
- The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
- 5*576-2400 TDNN model training done. AM size is about 17M
- 5*500-2400 TDNN model on training.
- MPE training 2/4 part
- test8000:wer16, test10000:wer30
Character LM
- Except Sogou-2T, 9-gram has been done.
- Add word boundary tag to Character-LM trainig done
- 9-gram
- Except Weibo & Sogou-2T
- 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
- 1e-8(54M) wer17.54
- Prepare specific domain vocabulary
- Dianxin/Baoxian/Dianli
- DT lm training
- ReFr
- Merge Character-LM & word-LM
- Union
- Compose, success.
- 2-step decoding: first, character-based LM. Then, word-based LM.
SiaSong Robot
- Beam-forming algorithm test
- NN-model based beam-forming
Project
- Pingan & Yueyu Deletion error too more
- TDNN deletion error rate > DNN deletion error rate
- TDNN Silence scale is too sensitive for different test cases.
SID
Digit
- Engine Package