“Sinovoice-2016-4-21”版本间的差异

2016年4月21日 (四) 08:40的最后版本

Data

16K LingYun

2000h data ready
4300h real-env data to label

YueYu

Total 250h(190h-YueYu + 60h-English)
Add 60h YueYu
CER: 75%->76%

WeiYu

50h for training
120h labeled ready

Model training

Deletion Error Promblem

Add one noise phone to alleviate the silence over-training
Omit sil accuracy in discriminative training

Testdata: test_1000ju from 8000ju

  ---------------------------------------------------
                 model   | ins  |  del  | sub | wer  
  ---------------------------------------------------
         baseMPE 3.mdl   |  25  |  68   | 468 | 9.50
  ---------------------------------------------------
   MPE omit_sil_acc 3.mdl|  26  |  72   | 453 | 9.33
  ---------------------------------------------------

Testdata: test_2000ju from 10000ju

  ----------------------------------------------------
                 model   | ins  |  del  | sub  | wer  
  ----------------------------------------------------
         baseMPE 3.mdl   |  96  |  768  | 1590 | 19.39
  ----------------------------------------------------
  MPE omit_sil_acc 3.mdl |  165 |  627  | 1685 | 19.58
  ----------------------------------------------------

H smoothing of XEnt and MPE
Add one silence arc from start-state to end-state

Big-Model Training

7*2048-10000h net weight-matrix factoring, to improve the decoding speed --SVD

SVD looks OK, but fine-tuning still didn't work.

 Base WER:
 relu_2000_mpe_1000H: 17.72
 relu_1200_mpe_1000H: 18.60

 |layer / nodes retaind|  200  |  400  |  600  |  800  | 1000  | 1200  | 1400  |  1600  |
 |      hidden 2       |       |       | 22.53 | 20.30 | 19.01 |       |       |        |
 |      hidden 7       |       | 18.92 | 18.30 | 17.92 |       |       |       |        |     
 |       final         |       |       | 18.32 | 18.00 | 17.83 |       |       |        |

7*1024 cross-entropy total train, then mpe, 0.2 improvment
7*1024 svd factoring, speed the decoding

8k

Embedding

10000h-chain 5*400+800 DONE.

Beam affect the performance of chain model significantly, need more investigation.

5*576-2400 TDNN model

SinSong Robot

Test based on 10000h(7*2048-xent) model

 ------------------------------------------------
   condition | clean  | replay(0.5m) | real-env
 ------------------------------------------------
     wer     |   3    |  18(mpe-14)  | too-bad
 ------------------------------------------------

Plan to record in restaurant on April 10.

Character LM

Except Sogou-2T, 9-gram has been done.
Worse than word-lm(9%->6%)
Add word boundary tag to Character-LM trainig
Merge Character-LM & word-LM

Union
Compose, success.

2-step decoding: first, character-based LM. Then, word-based LM.

Project

Pingan & Yueyu Deletion error too more

TDNN deletion error rate > DNN deletion error rate
TDNN Silence scale is too sensitive for different test cases.

SID

Digit

Same Channel test EER: 100%

Speaker confirm
phone channel

Cross Channel

Mic-wav PLDA adaptation EER from 9% to 7% (20-30 persons)

“Sinovoice-2016-4-21”版本间的差异

2016年4月21日 (四) 08:40的最后版本

目录

Data

Model training

Deletion Error Promblem

Big-Model Training

Embedding

SinSong Robot

Character LM

Project

SID

Digit

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具

@@ 第16行： / 第16行： @@
 ==Model training==
 ==Deletion Error Promblem==
+* Add one noise phone to alleviate the silence over-training
 * Omit sil accuracy in discriminative training
+:* Testdata: test_1000ju from 8000ju
+   ---------------------------------------------------
+                  model   | ins  |  del  | sub | wer
+   ---------------------------------------------------
+          baseMPE 3.mdl   |  25  |  68   | 468 | 9.50
+   ---------------------------------------------------
+    MPE omit_sil_acc 3.mdl|  26  |  72   | 453 | 9.33
+   ---------------------------------------------------
-Testdata: test_1000ju from 8000ju
+:* Testdata: test_2000ju from 10000ju
----------------------------------------------------
+   ----------------------------------------------------
-              model   | ins  |  del  | sub | wer
+                  model   | ins  |  del  | sub  | wer
----------------------------------------------------
+   ----------------------------------------------------
-      baseMPE 3.mdl   |  25  |  68   | 468 | 9.50
+          baseMPE 3.mdl   |  96  |  768  | 1590 | 19.39
----------------------------------------------------
+   ----------------------------------------------------
-MPE omit_sil_acc 3.mdl|  26  |  72   | 453 | 9.33
+   MPE omit_sil_acc 3.mdl |  165 |  627  | 1685 | 19.58
----------------------------------------------------
+   ----------------------------------------------------
-Testdata: test_2000ju from 10000ju
-----------------------------------------------------
-               model   | ins  |  del  | sub  | wer
-----------------------------------------------------
-       baseMPE 3.mdl   |  96  |  768  | 1590 | 19.39
-----------------------------------------------------
-MPE omit_sil_acc 3.mdl |  165 |  627  | 1685 | 19.58
-----------------------------------------------------
 * H smoothing of XEnt and MPE
 * Add one silence arc from start-state to end-state
 ===Big-Model Training===
 * 7*2048-10000h net weight-matrix factoring, to improve the decoding speed --SVD
@@ 第82行： / 第83行： @@
 :* TDNN deletion error rate > DNN deletion error rate
 :* TDNN Silence scale is too sensitive for different test cases.
 ==SID==
 ===Digit===