“Sinovoice-2014-03-11”版本间的差异
来自cslt Wiki
(→Training Analysis) |
|||
(相同用户的7个中间修订版本未显示) | |||
第7行: | 第7行: | ||
* PICC data are under labeling (200h) done. | * PICC data are under labeling (200h) done. | ||
* Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready. | * Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready. | ||
− | * 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data | + | * 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. |
+ | * LM training text need be prepared in 2 days. | ||
− | = | + | =Acoustic modeling= |
==Telephone model training== | ==Telephone model training== | ||
− | === | + | ===1000h Training=== |
+ | * Training recipe prepared | ||
+ | * Expect to finish in 7 days | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
===PICC dedicated training=== | ===PICC dedicated training=== | ||
第34行: | 第23行: | ||
<pre> | <pre> | ||
Baseline (470+300h): 45.03 | Baseline (470+300h): 45.03 | ||
− | + PICC | + | + PICC 188h incremental training (th=0.9): 41.89 |
− | + PICC | + | + PICC 188h incremental training (th=0.8): 41.64 |
− | + PICC | + | + PICC 188h labelled training: 34.78 |
− | + PICC | + | + PICC 188h labelled training + PICC text LM: 29.18 |
</pre> | </pre> | ||
− | |||
==6000 hour 16k training== | ==6000 hour 16k training== | ||
第71行: | 第59行: | ||
* HTK training on the same database | * HTK training on the same database | ||
:* HLDA: 18.22 | :* HLDA: 18.22 | ||
− | :* HLDA+MPE: | + | :* HLDA+MPE: 17.40 |
− | + | ||
===Hubei telecom=== | ===Hubei telecom=== | ||
第93行: | 第80行: | ||
</pre> | </pre> | ||
+ | =Language modeling= | ||
+ | * Need transfer the training text | ||
=DNN Decoder= | =DNN Decoder= |
2014年3月11日 (二) 06:42的最后版本
目录
Environment setting
- Raid212/Raid215/Disk212 done
Corpora
- PICC data are under labeling (200h) done.
- Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- LM training text need be prepared in 2 days.
Acoustic modeling
Telephone model training
1000h Training
- Training recipe prepared
- Expect to finish in 7 days
PICC dedicated training
Baseline (470+300h): 45.03 + PICC 188h incremental training (th=0.9): 41.89 + PICC 188h incremental training (th=0.8): 41.64 + PICC 188h labelled training: 34.78 + PICC 188h labelled training + PICC text LM: 29.18
6000 hour 16k training
Training progress
- Ran DNN MPE to iteration 5.
- Receipe
- 100h MPE training
- 1700h MPE alignment/lattice
- 1700h MPE training
- 1 week to complete 3 MPE iterations
- MPE2 result: 1e-9: 10.67% (8.61%), 1e-10: 10.34% (8.27%)
- MPE3 result: 1e-9: 10.48% (8.43%), 1e-10: 10.12% (8.05%)
- MPE4 result: 1e-9: 10.34% (8.31%), 1e-10: 10.03% (7.97%)
- MPE5 result:
Training Analysis
- Shared tree GMM model training completed, WER% is similar to non-shared model .
- Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
di-syl jt-ph GMM: - 20.86% Xent 15.42% 14.78% MPE1 14.46% 14.23% MPE2 14.22% 14.09% MPE3 14.26% 13.80% MPE4 14.24% 13.68%
- HTK training on the same database
- HLDA: 18.22
- HLDA+MPE: 17.40
Hubei telecom
- Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%
xEnt org: - wer_15 29.05 MPE iter1:wer_14 29.23;wer_15 29.38 MPE iter2:wer_14 29.05;wer_15 29.11 MPE iter3:wer_14 29.32;wer_15 29.28 MPE iter4:wer_14 29.29;wer_15 29.28
- retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data
xEnt org: - wer_15 29.05 MPE iter1: - wer_15: 29.36
Language modeling
- Need transfer the training text
DNN Decoder
Online decoder
- CMN code delivered. Integration is done
- CMN pipe code delivered. Model adaptation is on going