“Sinovoice-2016-5-26”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Project
Project
第63行: 第63行:
 
   =========================================================================================
 
   =========================================================================================
  
Error detail:
 
  ============================================================================================
 
  |        AM / error        | tot_err |  ins  |  del  |  sub  |  wer  |wer-no-ins|
 
  --------------------------------------------------------------------------------------------
 
  | tdnn 7-1024 xEnt          |  549  |  158  |    75  |  316  |  35.91  |  25.58  |
 
  | tdnn 7-1024 MPE            |  501  |  102  |  140  |  259  |  32.77  |  26.09  |
 
  | tdnn 7-1024 MPE adapt PABX |  477  |  132  |    92  |  253  |  31.20  |  22.56  |
 
  --------------------------------------------------------------------------------------------
 
  | spn 7-1024 xEnt            |  554  |  178  |    66  |  310  |  36.23  |  24.59  |
 
  | spn 7-1024 MPE 1000H      |  506  |  175  |    41  |  290  |  33.09  |  21.65  |
 
  | spn 7-1024 MPE adapt User  |  486  |  178  |    43  |  265  |  31.79  |  20.14  |
 
  ============================================================================================
 
  
 
* LiaoNingYiDong:
 
* LiaoNingYiDong:

2016年5月26日 (四) 03:40的版本

Data

  • 16K LingYun
  • 2000h data ready
  • 4300h real-env data to label
  • YueYu
  • Total 250h(190h-YueYu + 60h-English)
  • Add 60h YueYu
  • CER: 75%->76%
  • WeiYu
  • 8k more data
  • 50h for training
  • 120h labeled ready
  • PingAn
  • 100h User data done

Model training

Deletion Error Promblem

  • Add one noise phone to alleviate the silence over-training
  • Omit sil accuracy in discriminative training
  • H smoothing of XEnt and MPE
  • Add one silence arc from start-state to end-state

Big-Model Training

16k

8k=

Model
  • Add noise phone
  • 1300.mdl done
  • CNN + TDNN
  • 280/900 mdl
  • Need about 12 days Xent
Project
  • PingAn
  • Add noise phone to phone-list
 =========================================================================================
 |     AM / config                     |      all      |    KeHu wer   ||  KeHu no-ins  |
 -----------------------------------------------------------------------------------------
 | tdnn 7-2048 xEnt                    |     16.45     |     36.49     ||     25.18     |
 | tdnn 7-2048 MPE                     |     15.22     |     32.77     ||     23.48     |
 | tdnn 7-2048 MPE adapt-PABX          |     14.67     |     31.33     ||     22.76     |
 -----------------------------------------------------------------------------------------
 | tdnn 7-1024 xEnt                    |     16.60     |     35.91     ||     25.58     |
 | tdnn 7-1024 MPE 2e-6                |     15.67     |     32.77     ||     26.09     |
 | tdnn 7-1024 MPE 2e-5 1.mdl          |     15.54     |     32.77     ||     26.29     |
 | tdnn 7-1024 MPE 1e-5 4.mdl          |     15.76     |     33.55     ||     27.20     |
 | tdnn 7-1024 MPE adapt-PABX          |     14.80     |     30.48     ||     22.56     |
 -----------------------------------------------------------------------------------------
 | spn 7-1024 xEnt                     |     16.49     |     36.23     ||     24.59     |
 | spn 7-1024 xEnt xEnt-PA_user 101.mdl|     16.19     |     33.22     ||     22.69     |
 | spn 7-1024 xEnt xEnt-PA_user mpe    |     15.24     |     32.77     ||     21.65     |
 | spn 7-1024 MPE-1000H 23.mdl         |     15.29     |     33.09     ||     21.65     |
 | spn 7-1024 MPE adapt-PA_all 29.mdl  |     15.11     |     33.42     ||     21.84     |
 | spn 7-1024 MPE adapt-PA_user 2e-5   |     15.31     |     31.79     ||     20.14     |
 | spn 7-1024 MPE adapt-PA_user Hs 2e-5|     15.32     |     32.24     ||     20.93     |
 =========================================================================================


  • LiaoNingYiDong:
 =================================================================
 |     AM / config                     |     LNYD      |
 -----------------------------------------------------------------
 | tdnn 7-2048 xEnt                    |     21.51     |
 | tdnn 7-2048 MPE                     |     20.09     |
 | tdnn 7-2048 MPE adapt-LNYD          |     17.92     |
 -----------------------------------------------------------------
 | tdnn 7-1024 xEnt                    |     21.72     |
 | tdnn 7-1024 MPE                     |     20.99     |
 -----------------------------------------------------------------
 | cnn 7-1024 xEnt 600.mdl             |     21.03     |
 | cnn 7-1024 MPE 12.mdl               |     19.80     |
 | cnn 7-1024 MPE adapt-LNYD 41.mdl    |     17.96     |
 -----------------------------------------------------------------
 | spn 7-1024 xEnt                     |     21.70     |
 | spn 7-1024 MPE-1000H 23.mdl         |     19.97     |
 | spn 7-1024 MPE adapt-LNYD           |     18.67     |
 =================================================================

Embedding

  • The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
  • 5*500-2400 TDNN model on training.
  • no-svd model, MPE training done
  • svd-100 model, MPE training 2/4 epochs finished.
LM=1e-5, beam=9, max-active=5000
 =============================================================================================================
 |         AM / testset              |  test_1000ju  |  test_2000ju  |  test_8000ju  |  test_10000ju  |
 -------------------------------------------------------------------------------------------------------------
 | nnet1 4*600+800 xEnt (6.4M)       |     25.30     |     40.48     |               |                |
 | nnet1 4*600+800 mpe  (6.4M)       |     20.75     |     35.33     |               |                |
 -------------------------------------------------------------------------------------------------------------
 | nnet3 5*500 mpe (13M)             |     16.18     |     29.53     |               |                |
 | nnet3 5*500 svd-100 mpe (9.5M)    |     17.87     |     30.39     |               |                |
 =============================================================================================================
LM=1e-6, beam=9, max-active=5000
 =============================================================================================================
 |         AM / testset              |  test_1000ju  |  test_2000ju  |  test_8000ju  |  test_10000ju  |
 -------------------------------------------------------------------------------------------------------------
 | nnet1 4*600+800 xEnt (6.4M)       |     21.09     |     36.23     |               |                |
 | nnet1 4*600+800 mpe  (6.4M)       |     16.44     |     30.75     |               |                |
 -------------------------------------------------------------------------------------------------------------
 | nnet3 5*500 mpe (13M)             |     13.39     |     25.90     |               |                |
 | nnet3 5*500 svd-100 mpe (9.5M)    |     14.49     |     26.55     |               |                |
 =============================================================================================================

Character LM

  • Except Sogou-2T, 9-gram has been done.
  • Add word boundary tag to Character-LM trainig done
  • 9-gram
  • Except Weibo & Sogou-2T
  • 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
  • 1e-8(54M) wer17.54
  • Prepare specific domain vocabulary
  • Dianxin/Baoxian/Dianli
  • DT lm training
  • ReFr
  • Merge Character-LM & word-LM
  • Union
  • Compose, success.
  • 2-step decoding: first, character-based LM. Then, word-based LM.

SiaSong Robot

  • Beam-forming algorithm test
  • NN-model based beam-forming

Project

  • Pingan & Yueyu Deletion error too more
  • TDNN deletion error rate > DNN deletion error rate
  • TDNN Silence scale is too sensitive for different test cases.

SID

Digit

  • Engine Package