“Sinovoice-2016-5-5”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Data== *16K LingYun :* 2000h data ready :* 4300h real-env data to label * YueYu :* Total 250h(190h-YueYu + 60h-English) :* Add 60h YueYu :* CER: 75%->76% * WeiY...”为内容创建页面)
 
 
(1位用户的3个中间修订版本未显示)
第72行: 第72行:
 
===Big-Model Training===
 
===Big-Model Training===
 
* 16k
 
* 16k
  ================================================================================================
 
  |                      |  TDNN 7-1200  | TDNN 7-1200 enhance | TDNN 7-1200 svd600 |
 
  ------------------------------------------------------------------------------------------------
 
  |8000ju frame_skip=1  |                |  0.0556 / 0.349    |  0.0559 / 0.306    |
 
  |8000ju frame_skip=2  |  0.059 / 0.243  |  0.0591 / 0.231    |  0.0589 / 0.228    |
 
  ------------------------------------------------------------------------------------------------
 
  |10000ju frame_skip=1  |                |  0.1241 / 0.341    |  0.1244 / 0.358    |
 
  |10000ju frame_skip=2  |  0.1348 / 0.234 |  0.1315 / 0.245    |  0.1311 / 0.204    |
 
  ------------------------------------------------------------------------------------------------
 
  |English frame_skip=1  |                |  0.3897 / 0.370    |  0.4062 / 0.353    |
 
  |English frame_skip=2  |  0.4296        |  0.4237 / 0.276    |  0.4306 / 0.252    |
 
  ================================================================================================
 
 
  
 
* 8k
 
* 8k
  PingAn:
+
PingAnAll:
   ===============================================================================
+
   ==================================================================================
   |     AM / config            all beam9   |all beam9 biglm|| KeHu beam9   |
+
   |         AM / error        | tot_err ins   |   del  |   sub  |   wer   |
   -------------------------------------------------------------------------------
+
   ----------------------------------------------------------------------------------
   | tdnn 7-2048 xEnt           |    16.45    |     16.22    || 36.49 / 25.18 |
+
   | tdnn 7-1024 xEnt 2500.mdl  | 3626  |   619  |   773  2234  | 16.60  |
   | tdnn 7-2048 MPE            |    15.22    |    14.87    || 32.77 / 23.48 |
+
   ----------------------------------------------------------------------------------
   | tdnn 7-2048 MPE adapt-PABX |    14.67    |    14.63    || 31.33 / 22.76 |
+
   | spn 7-1024 xEnt 300.mdl    | 3746  |   702  |   763  2281  | 17.15 |
  -------------------------------------------------------------------------------
+
   ==================================================================================
   | tdnn 7-1024 xEnt           |    16.60    |     16.25    || 35.91 / 25.58 |
+
   | tdnn 7-1024 MPE            |    15.67    |    15.61    || 32.77 / 26.09 |
+
  | tdnn 7-1024 MPE adapt-PABX |    14.80    |    14.76    || 30.48 / 22.56 |
+
   ===============================================================================
+
  
 +
PingAnUser:
 +
  ==================================================================================
 +
  |        AM / error        | tot_err |  ins  |  del  |  sub  |  wer  |
 +
  ----------------------------------------------------------------------------------
 +
  | tdnn 7-1024 xEnt 2500.mdl  |  549  |  158  |    75  |  316  |  35.91  |
 +
  ----------------------------------------------------------------------------------
 +
  | spn 7-1024 xEnt 300.mdl    |  571  |  151  |    97  |  323  |  37.34  |
 +
  ==================================================================================
  
  LiaoNingYiDong:
+
LiaoNingYiDong:
   ==============================================================================
+
   ==================================================================================
   |     AM / config            |     beam9    beam9 biglm |     beam13    |
+
   |         AM / error        | tot_err ins  |   del  |  sub  |  wer  |
   ------------------------------------------------------------------------------
+
   ----------------------------------------------------------------------------------
   | tdnn 7-2048 xEnt           |    21.51    |     21.05    |     21.17    |
+
   | tdnn 7-1024 xEnt 2500.mdl  |   5873  |   879  1364  |   3630  | 21.72  |
   | tdnn 7-2048 MPE            |     20.09    |    19.74    |    19.74    |
+
   ----------------------------------------------------------------------------------
   | tdnn 7-2048 MPE adapt-LNYD |    17.92    |    17.87    |    17.58    |
+
   | spn 7-1024 xEnt 300.mdl    |   6257  |   977  1348  |   3923  | 23.14  |
  ------------------------------------------------------------------------------
+
   ==================================================================================
   | tdnn 7-1024 xEnt           |    21.72    |     22.74    |     21.64    |
+
   | tdnn 7-1024 MPE            |     20.99    |    20.77    |    20.74    |
+
  | tdnn 7-1024 MPE adapt-LNYD |              |              |              |
+
   ==============================================================================
+
 
+
  
 
===Embedding===
 
===Embedding===
第119行: 第105行:
 
* 5*576-2400 TDNN model training done. AM size is about 17M
 
* 5*576-2400 TDNN model training done. AM size is about 17M
 
* 5*500-2400 TDNN model on training.
 
* 5*500-2400 TDNN model on training.
 +
* making lattice for MPE training.
  
===SinSong Robot===
+
===Character LM===
* Test based on 10000h(7*2048-xent) model
+
* Except Sogou-2T, 9-gram has been done.
  ------------------------------------------------
+
* Add word boundary tag to Character-LM trainig done
    condition | clean  | replay(0.5m) | real-env
+
:* 9-gram
  ------------------------------------------------
+
:* Except Weibo & Sogou-2T
      wer    |  3    |  18(mpe-14)  | too-bad
+
* Prepare specific domain vocabulary
  ------------------------------------------------
+
:* Dianxin/Baoxian/Dianli
  
* Plan to record in restaurant on April 10.
+
*DT lm training
 
+
===Character LM===
+
*Except Sogou-2T, 9-gram has been done.
+
*Worse than word-lm(9%->6%)
+
*Add word boundary tag to Character-LM trainig
+
 
*Merge Character-LM  & word-LM
 
*Merge Character-LM  & word-LM
 
:* Union
 
:* Union
 
:* Compose, success.
 
:* Compose, success.
 
* 2-step decoding: first, character-based LM. Then, word-based LM.
 
* 2-step decoding: first, character-based LM. Then, word-based LM.
*Word boundary character training
+
 
 +
===SiaSong Robot===
 +
* Beam-forming algorithm test
 +
* NN-model based beam-forming
  
 
===Project===
 
===Project===
第147行: 第132行:
 
==SID==
 
==SID==
 
===Digit===
 
===Digit===
* DNN-PLDA gets better performance than i-Vector;
+
* Engine Package
DNN
+
cosine
+
10.4167%, at threshold 89.3973
+
9.72222%, at threshold 87.8146
+
8.68056%, at threshold 84.2021
+
3.47222%, at threshold 11.5852
+
lda
+
3.125%, at threshold 54.1172
+
2.77778%, at threshold 50.1447
+
2.43056%, at threshold 48.6887
+
1.73611%, at threshold 14.5075
+
plda
+
2.43056%, at threshold -23.954
+
2.08333%, at threshold -24.6051
+
2.08333%, at threshold -21.0524
+
1.73611%, at threshold 4.83949
+
 
+
ivector
+
plda
+
3.15789%, at threshold 0.563044
+
3.85965%, at threshold 0.525273
+
3.85965%, at threshold 0.502531
+
2.80702%, at threshold 0.429186
+

2016年5月5日 (四) 06:35的最后版本

Data

  • 16K LingYun
  • 2000h data ready
  • 4300h real-env data to label
  • YueYu
  • Total 250h(190h-YueYu + 60h-English)
  • Add 60h YueYu
  • CER: 75%->76%
  • WeiYu
  • 50h for training
  • 120h labeled ready

Model training

Deletion Error Promblem

  • Add one noise phone to alleviate the silence over-training
  • Omit sil accuracy in discriminative training
  • H smoothing of XEnt and MPE
  • Testdata: test_1000ju from 8000ju
  -----------------------------------------------------------------------------
                 model                    | ins  |  del  | sub | wer/tot-err  
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix         |  24  |  56   | 408 | 8.26/488
  -----------------------------------------------------------------------------
svd600_lr2e-5_1000H_mpe_uv-fix_omitsilacc |  32  |  48   | 409 | 8.28/489
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  24  |  57   | 406 | 8.24/487
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  25  |  60   | 409 | 8.36/494
  -----------------------------------------------------------------------------
  • Testdata: test_8000ju
  -----------------------------------------------------------------------------
                 model                    | ins  |  del  | sub  | wer/tot-err  
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix         |  140 |  562  | 3686 | 9.19/4388     | 47753-total-word
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  146 |  510  | 3705 | 9.13/4361
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  139 |  492  | 3739 | 9.15/4370
  -----------------------------------------------------------------------------
  • Testdata: test_2000ju from 10000ju
  -----------------------------------------------------------------------------
                 model                    | ins  |  del  |  sub | wer/tot-err  
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix         |  86  |  790  | 1471 | 18.55/2347
  -----------------------------------------------------------------------------
svd600_lr2e-5_1000H_mpe_uv-fix_omitsilacc |  256 |  473  | 1669 | 18.95/2398
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  95  |  704  | 1548 | 18.55/2347
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  100 |  697  | 1557 | 18.60/2354
  -----------------------------------------------------------------------------
  • Testdata: test_10000ju
  -----------------------------------------------------------------------------
                 model                    | ins  |  del  | sub  | wer/tot-err  
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix         |  478 | 3905  | 7698 | 18.31/12081  | 65989-total-word
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 |  481 | 3741  | 7773 | 18.18/11995
  -----------------------------------------------------------------------------
   svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 |  502 | 3657  | 7826 | 18.16/11985
  -----------------------------------------------------------------------------
  • Add one silence arc from start-state to end-state

Big-Model Training

  • 16k
  • 8k
PingAnAll:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |  3626   |   619   |   773   |   2234  |  16.60  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 300.mdl    |  3746   |   702   |   763   |   2281  |  17.15  |
 ==================================================================================
PingAnUser:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |   549   |   158   |    75   |   316   |  35.91  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 300.mdl    |   571   |   151   |    97   |   323   |  37.34  |
 ==================================================================================
LiaoNingYiDong:
 ==================================================================================
 |         AM / error         | tot_err |   ins   |   del   |   sub   |   wer   |
 ----------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt 2500.mdl  |   5873  |   879   |   1364  |   3630  |  21.72  |
 ----------------------------------------------------------------------------------
 | spn 7-1024 xEnt 300.mdl    |   6257  |   977   |   1348  |   3923  |  23.14  |
 ==================================================================================

Embedding

  • The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
  • 5*576-2400 TDNN model training done. AM size is about 17M
  • 5*500-2400 TDNN model on training.
  • making lattice for MPE training.

Character LM

  • Except Sogou-2T, 9-gram has been done.
  • Add word boundary tag to Character-LM trainig done
  • 9-gram
  • Except Weibo & Sogou-2T
  • Prepare specific domain vocabulary
  • Dianxin/Baoxian/Dianli
  • DT lm training
  • Merge Character-LM & word-LM
  • Union
  • Compose, success.
  • 2-step decoding: first, character-based LM. Then, word-based LM.

SiaSong Robot

  • Beam-forming algorithm test
  • NN-model based beam-forming

Project

  • Pingan & Yueyu Deletion error too more
  • TDNN deletion error rate > DNN deletion error rate
  • TDNN Silence scale is too sensitive for different test cases.

SID

Digit

  • Engine Package