“Sinovoice-2014-12-10”版本间的差异
来自cslt Wiki
(以内容“=DNN training= ==Environment setting== * Another 3 3T disks are ready for RADI-0. * Another GPU machine was purchased. ==Corpora== * Scripts for confidence generati...”创建新页面) |
|||
(相同用户的10个中间修订版本未显示) | |||
第3行: | 第3行: | ||
==Environment setting== | ==Environment setting== | ||
− | * Another 3 3T disks are ready for | + | * Another 3 3T disks are ready for RAID-0. |
− | * Another GPU machine was purchased. | + | * Another GPU machine was purchased. Add 4 3T disks to construct RAID-0. |
==Corpora== | ==Corpora== | ||
− | * Scripts for confidence generation is ready for auto transcription | + | * Scripts for confidence generation is ready for auto transcription. |
− | * 300h telephone speech data (Sinovoice recording) were done | + | * 300h telephone speech data (Sinovoice recording) were done. |
− | + | * Adaptation data 900 sentences ready. | |
==470 hour 8k training== | ==470 hour 8k training== | ||
第29行: | 第29行: | ||
* Ran CE DNN to iteration 5 (8400 states, 80000 pdf) | * Ran CE DNN to iteration 5 (8400 states, 80000 pdf) | ||
− | * Testing results go down to 13% WER. | + | * Testing results go down to 13.77% WER (Sinovoice results: 11.78). |
{| class="wikitable" | {| class="wikitable" | ||
− | ! Model !! WER !! RT | + | ! Model !! WER !! RT |
|- | |- | ||
− | |small LM, it 4, -5/-9 ||15.80 || | + | |small LM, it 4, -5/-9 ||15.80 || 1.18 |
|- | |- | ||
− | |large LM, it 4, -5/-9 || 15.30 || | + | |large LM, it 4, -5/-9 || 15.30 || 1.50 |
|- | |- | ||
− | large LM, it 4, -6/-9 || 15.36 || | + | |large LM, it 4, -6/-9 || 15.36 || 1.30 |
|- | |- | ||
− | large LM, it 4, -7/-9 || 15.25 || | + | |large LM, it 4, -7/-9 || 15.25 || 1.30 |
|- | |- | ||
− | large LM, it 5, -5/-9 || 14.17 || | + | |large LM, it 5, -5/-9 || 14.17 || 1.10 |
|- | |- | ||
− | | | + | |large LM, it 5, -5/-10 || 13.77 || 1.29 |
|- | |- | ||
|} | |} | ||
==Adaptation== | ==Adaptation== | ||
+ | |||
+ | * Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin | ||
+ | * 50 sentences for adaptation, 834 sentences for testing | ||
+ | * WER from 14.56 to 11.13 | ||
+ | * Hidden layer adaptation is better than input and output adaptation | ||
+ | * Before Linear adaptation is better than after-linear adaptation | ||
+ | * Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=158 here] | ||
=DNN Decoder= | =DNN Decoder= | ||
第54行: | 第61行: | ||
* Comparison between CLG and HCLG decoder | * Comparison between CLG and HCLG decoder | ||
:* CLG decoder uses less memory in decoding | :* CLG decoder uses less memory in decoding | ||
− | :* HCLG is faster and more accurate than | + | :* HCLG is faster and more accurate than CLG, and more amiable to beam control [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=156 here] |
+ | |||
+ | * Faster decoder | ||
:* std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log(). | :* std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log(). | ||
+ | :* The RT of the latest decoder on train203 is 0.25 | ||
+ | |||
+ | * Online decoder | ||
+ | :* Chao will focus on interface change and CMN adaptation. |
2014年2月11日 (二) 14:00的最后版本
目录
DNN training
Environment setting
- Another 3 3T disks are ready for RAID-0.
- Another GPU machine was purchased. Add 4 3T disks to construct RAID-0.
Corpora
- Scripts for confidence generation is ready for auto transcription.
- 300h telephone speech data (Sinovoice recording) were done.
- Adaptation data 900 sentences ready.
470 hour 8k training
- 300h incremental training (IT) done
Model | CE | MPE1 | MPE2 | MPE3 | MPE4 |
---|---|---|---|---|---|
4k states | 23.27/22.85 | 21.35/18.87 | 21.18/18.76 | 21.07/18.54 | 20.93/18.32 |
8k states | 22.16/22.22 | 20.55/18.03 | 20.36/17.94 | 20.32/17.78 | 20.29/17.80 |
8k states + IT | - | 20.04/17.38 | 20.01/17.32 | 20.07/17.44 | 19.94/17.65 |
6000 hour 16k training
- Ran CE DNN to iteration 5 (8400 states, 80000 pdf)
- Testing results go down to 13.77% WER (Sinovoice results: 11.78).
Model | WER | RT |
---|---|---|
small LM, it 4, -5/-9 | 15.80 | 1.18 |
large LM, it 4, -5/-9 | 15.30 | 1.50 |
large LM, it 4, -6/-9 | 15.36 | 1.30 |
large LM, it 4, -7/-9 | 15.25 | 1.30 |
large LM, it 5, -5/-9 | 14.17 | 1.10 |
large LM, it 5, -5/-10 | 13.77 | 1.29 |
Adaptation
- Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
- 50 sentences for adaptation, 834 sentences for testing
- WER from 14.56 to 11.13
- Hidden layer adaptation is better than input and output adaptation
- Before Linear adaptation is better than after-linear adaptation
- Results are here
DNN Decoder
- Comparison between CLG and HCLG decoder
- CLG decoder uses less memory in decoding
- HCLG is faster and more accurate than CLG, and more amiable to beam control here
- Faster decoder
- std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
- The RT of the latest decoder on train203 is 0.25
- Online decoder
- Chao will focus on interface change and CMN adaptation.