“Sinovoice-2016-5-12”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Data== *16K LingYun :* 2000h data ready :* 4300h real-env data to label * YueYu :* Total 250h(190h-YueYu + 60h-English) :* Add 60h YueYu :* CER: 75%->76% * WeiY...”为内容创建页面)
 
 
第13行: 第13行:
 
:* 50h for training
 
:* 50h for training
 
:* 120h labeled ready
 
:* 120h labeled ready
 +
 +
* PingAn
 +
:*100h User data done
  
 
==Model training==
 
==Model training==
第19行: 第22行:
 
* Omit sil accuracy in discriminative training
 
* Omit sil accuracy in discriminative training
 
* H smoothing of XEnt and MPE
 
* H smoothing of XEnt and MPE
 +
* Add one silence arc from start-state to end-state
  
:* Testdata: test_1000ju from 8000ju
+
===Big-Model Training===
  -----------------------------------------------------------------------------
+
====16k====
                  model                    | ins  |  del  | sub | wer/tot-err 
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix        |  24  |  56  | 408 | 8.26/488
+
  -----------------------------------------------------------------------------
+
svd600_lr2e-5_1000H_mpe_uv-fix_omitsilacc |  32  |  48  | 409 | 8.28/489
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  24  |  57  | 406 | 8.24/487
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  25  |  60  | 409 | 8.36/494
+
  -----------------------------------------------------------------------------
+
  
:* Testdata: test_8000ju
+
====8k=====
  -----------------------------------------------------------------------------
+
=====Model=====
                  model                    | ins  |  del  | sub  | wer/tot-err 
+
* Add noise phone
  -----------------------------------------------------------------------------
+
:* 1300.mdl done
    svd600_lr2e-5_1000H_mpe_uv-fix        |  140 |  562  | 3686 | 9.19/4388    | 47753-total-word
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  146 |  510  | 3705 | 9.13/4361
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  139 |  492  | 3739 | 9.15/4370
+
  -----------------------------------------------------------------------------
+
  
:* Testdata: test_2000ju from 10000ju
+
* CNN + TDNN
  -----------------------------------------------------------------------------
+
:* 280/900 mdl
                  model                    | ins  |  del  |  sub | wer/tot-err 
+
:* Need about 12 days Xent
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix        |  86  |  790  | 1471 | 18.55/2347
+
  -----------------------------------------------------------------------------
+
svd600_lr2e-5_1000H_mpe_uv-fix_omitsilacc |  256 |  473  | 1669 | 18.95/2398
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  95  |  704  | 1548 | 18.55/2347
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  100 |  697  | 1557 | 18.60/2354
+
  -----------------------------------------------------------------------------
+
  
:* Testdata: test_10000ju
+
=====Project=====
  -----------------------------------------------------------------------------
+
* PingAn
                  model                    | ins  |  del  | sub  | wer/tot-err 
+
:* Add noise phone to phone-list
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix        |  478 | 3905  | 7698 | 18.31/12081  | 65989-total-word
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  481 | 3741  | 7773 | 18.18/11995
+
  -----------------------------------------------------------------------------
+
    svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  502 | 3657  | 7826 | 18.16/11985
+
  -----------------------------------------------------------------------------
+
  
* Add one silence arc from start-state to end-state
 
 
===Big-Model Training===
 
* 16k
 
 
* 8k
 
 
  PingAnAll:
 
  PingAnAll:
 
   ==================================================================================
 
   ==================================================================================
第80行: 第46行:
 
   | tdnn 7-1024 xEnt 2500.mdl  |  3626  |  619  |  773  |  2234  |  16.60  |
 
   | tdnn 7-1024 xEnt 2500.mdl  |  3626  |  619  |  773  |  2234  |  16.60  |
 
   ----------------------------------------------------------------------------------
 
   ----------------------------------------------------------------------------------
   | spn 7-1024 xEnt 300.mdl   |  3746  |  702  |  763  |  2281  |  17.15 |
+
   | spn 7-1024 xEnt 1300.mdl   |  3746  |  702  |  763  |  2281  |  16.xx |
 
   ==================================================================================
 
   ==================================================================================
  
第89行: 第55行:
 
   | tdnn 7-1024 xEnt 2500.mdl  |  549  |  158  |    75  |  316  |  35.91  |
 
   | tdnn 7-1024 xEnt 2500.mdl  |  549  |  158  |    75  |  316  |  35.91  |
 
   ----------------------------------------------------------------------------------
 
   ----------------------------------------------------------------------------------
   | spn 7-1024 xEnt 300.mdl   |  571  |  151  |    97  |  323  |  37.34 |
+
   | spn 7-1024 xEnt 1300.mdl   |  571  |  151  |    97  |  323  |  35.xx |
 
   ==================================================================================
 
   ==================================================================================
  
LiaoNingYiDong:
+
*LiaoNingYiDong
   ==================================================================================
+
 
 +
    
 +
==================================================================================
 
   |        AM / error        | tot_err |  ins  |  del  |  sub  |  wer  |
 
   |        AM / error        | tot_err |  ins  |  del  |  sub  |  wer  |
 
   ----------------------------------------------------------------------------------
 
   ----------------------------------------------------------------------------------
第105行: 第73行:
 
* 5*576-2400 TDNN model training done. AM size is about 17M
 
* 5*576-2400 TDNN model training done. AM size is about 17M
 
* 5*500-2400 TDNN model on training.
 
* 5*500-2400 TDNN model on training.
* making lattice for MPE training.
+
* MPE training 2/4 part
 +
:* test8000:wer16, test10000:wer30
  
 
===Character LM===
 
===Character LM===
第112行: 第81行:
 
:* 9-gram
 
:* 9-gram
 
:* Except Weibo & Sogou-2T
 
:* Except Weibo & Sogou-2T
 +
:* 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
 +
:* 1e-8(54M) wer17.54
 +
 
* Prepare specific domain vocabulary
 
* Prepare specific domain vocabulary
 
:* Dianxin/Baoxian/Dianli
 
:* Dianxin/Baoxian/Dianli
  
 
*DT lm training
 
*DT lm training
 +
:* ReFr
 +
 
*Merge Character-LM  & word-LM
 
*Merge Character-LM  & word-LM
 
:* Union
 
:* Union

2016年5月12日 (四) 07:31的最后版本

Data

  • 16K LingYun
  • 2000h data ready
  • 4300h real-env data to label
  • YueYu
  • Total 250h(190h-YueYu + 60h-English)
  • Add 60h YueYu
  • CER: 75%->76%
  • WeiYu
  • 50h for training
  • 120h labeled ready
  • PingAn
  • 100h User data done

Model training

Deletion Error Promblem

  • Add one noise phone to alleviate the silence over-training
  • Omit sil accuracy in discriminative training
  • H smoothing of XEnt and MPE
  • Add one silence arc from start-state to end-state

Big-Model Training

16k

8k=

Model
  • Add noise phone
  • 1300.mdl done
  • CNN + TDNN
  • 280/900 mdl
  • Need about 12 days Xent
Project
  • PingAn
  • Add noise phone to phone-list
PingAnAll:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |  3626   |   619   |   773   |   2234  |  16.60  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 1300.mdl   |  3746   |   702   |   763   |   2281  |  16.xx  |
 ==================================================================================
PingAnUser:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |   549   |   158   |    75   |   316   |  35.91  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 1300.mdl   |   571   |   151   |    97   |   323   |  35.xx  |
 ==================================================================================
  • LiaoNingYiDong


======================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |   5873  |   879   |   1364  |   3630  |  21.72  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 300.mdl    |   6257  |   977   |   1348  |   3923  |  23.14  |
 ==================================================================================

Embedding

  • The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
  • 5*576-2400 TDNN model training done. AM size is about 17M
  • 5*500-2400 TDNN model on training.
  • MPE training 2/4 part
  • test8000:wer16, test10000:wer30

Character LM

  • Except Sogou-2T, 9-gram has been done.
  • Add word boundary tag to Character-LM trainig done
  • 9-gram
  • Except Weibo & Sogou-2T
  • 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
  • 1e-8(54M) wer17.54
  • Prepare specific domain vocabulary
  • Dianxin/Baoxian/Dianli
  • DT lm training
  • ReFr
  • Merge Character-LM & word-LM
  • Union
  • Compose, success.
  • 2-step decoding: first, character-based LM. Then, word-based LM.

SiaSong Robot

  • Beam-forming algorithm test
  • NN-model based beam-forming

Project

  • Pingan & Yueyu Deletion error too more
  • TDNN deletion error rate > DNN deletion error rate
  • TDNN Silence scale is too sensitive for different test cases.

SID

Digit

  • Engine Package