“2013-04-26”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(相同用户的11个中间修订版本未显示)
第6行: 第6行:
 
===400 hour DNN training===
 
===400 hour DNN training===
 
{| class="wikitable"
 
{| class="wikitable"
!Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN(with fMMI) !! Hybrid
+
!Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN(with fMMI) !! Hybrid (DNN/HMM)
 
|-
 
|-
 
|1900||8.4  || 7.65 || 7.35||6.57 || 7.27
 
|1900||8.4  || 7.65 || 7.35||6.57 || 7.27
第26行: 第26行:
 
|}
 
|}
  
*Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
+
*Note
*Our results are with 400 hour AM, 88k LM. ML+bMMI.
+
:Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
*The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.  
+
:Our results are with 400 hour AM, 88k LM. ML+bMMI.
*The CSLT feature: MFCC+delta MFCC
+
:The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.  
* compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.
+
:The CSLT feature: MFCC+delta MFCC
 +
*To be done:
 +
:#compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.
  
 
===Tencent test result===
 
===Tencent test result===
  
: AM: 70h training data
+
:AM: 70h training data
: LM: 88k LM  
+
:LM: 88k LM  
: Test case: general  
+
:Test case: general  
 +
:param=700k
  
 
{|class="wikitable"
 
{|class="wikitable"
!Feature !! GMM !!GMM-bMMI !! DNN !! DNN-MMI
+
!Feature !! GMM !!GMM-bMMI !! DNN !! DNN-MMI !! DNN structure
 
|-
 
|-
|PLP(-5,+5) [Eryu]          || 47 || 38.4  || 26.5 || 23.8  
+
|PLP(-5,+5) [Eryu]          || 47 || 38.4  || 26.5 || 23.8 ||300*1200*1200*1200*1200]*1700
 
|-
 
|-
|PLP+LDA+MLLT(-5,+5)[Jingbo] || 47 || -     || 34  
+
|PLP+LDA+MLLT(-5,+5)[Jingbo] || 47 || -|| 34 || - ||300*[1007*1007*1007*1007]*3xxx
 
|-
 
|-
 
|}
 
|}
  
* Tencent NN structure:
 
:300*[1200*1200*1200*1200]*1700, #param=700k
 
:300*[1007*1007*1007*1007]*3xxx  #param=700k
 
  
:*CSLT reproduce phone-clustered based NN
+
*To be done:
:*CSLT investigate performance of different epochs.
+
:#CSLT: reproduce phone-clustered NN (Eryu's results)
 +
:#CSLT: investigate performance of different epoches.
 +
:#Tencent: feature comparison.
 +
:#Tencent: FBank with PLP. With or without LDA.
  
  
 
===GPU & CPU merge===
 
===GPU & CPU merge===
: Investigate the possibility to merge GPU and CPU code.  
+
# Investigate the possibility to merge GPU and CPU code.  
: CUDA code merged to CPU.  
+
:Decision: CUDA code merged to CPU.  
  
 
===L-1 sparse initial training===
 
===L-1 sparse initial training===
Start to investigating.
+
*Initial trial
 +
:# L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite.
 +
:#  L-1=1e-6, the same results obtained, means le-6 is too small to be effective.
 +
:#  L-1=1e-4, start from the first iteration. crashed. Need more investigation.
 +
*To be done
 +
:# Investigate other L-1 choice, starting from the scratch.
  
 
==Kaldi/HTK merge==
 
==Kaldi/HTK merge==
:* HTK2Kaldi: hold.
+
* HTK2Kaldi: hold.
:* Kaldi2HTK: done with implementation. Performance improved.  
+
* Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%.
 +
* Possibly the SP model issue, due to the complicated structure of silence in Kaldi.
 +
* To be done
 +
:Try other possible SP, e.g., duplicate the silence model, with a jump arck from the start to the end.
 +
:Try borrow SP from the HTK model
  
 
==Embedded progress==
 
==Embedded progress==
:* PocketSphinx migration done. Very slow.
+
*Status:
:* QA LM training, done.
+
:# QA LM training, done.
 +
:# PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding (with an LM involving 90k words) is very slow (RT=7.0).
 +
 
 +
 
 +
*To be done
 +
:# substitute the LM with JSGF grammar involving 1000 words. Finish the initial test.
 +
:# Need to train a new AM.

2013年4月26日 (五) 07:14的最后版本

Data sharing

  • LM count files are still in transfering.

DNN progress

400 hour DNN training

Test Set Tencent Baseline bMMI fMMI BN(with fMMI) Hybrid (DNN/HMM)
1900 8.4 7.65 7.35 6.57 7.27
2044 22.4 24.44 24.03 21.77 20.24
online1 35.6 34.66 34.33 31.44 30.53
online2 29.6 27.23 26.80 24.10 23.89
map 24.5 27.54 27.69 23.79 22.46
notepad 16 19.81 21.75 15.81 12.74
general 36 38.52 38.90 33.61 31.55
speedup 26.8 27.88 26.81 22.82 22.00
  • Note
Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
Our results are with 400 hour AM, 88k LM. ML+bMMI.
The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.
The CSLT feature: MFCC+delta MFCC
  • To be done:
  1. compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.

Tencent test result

AM: 70h training data
LM: 88k LM
Test case: general
param=700k
Feature GMM GMM-bMMI DNN DNN-MMI DNN structure
PLP(-5,+5) [Eryu] 47 38.4 26.5 23.8 300*1200*1200*1200*1200]*1700
PLP+LDA+MLLT(-5,+5)[Jingbo] 47 - 34 - 300*[1007*1007*1007*1007]*3xxx


  • To be done:
  1. CSLT: reproduce phone-clustered NN (Eryu's results)
  2. CSLT: investigate performance of different epoches.
  3. Tencent: feature comparison.
  4. Tencent: FBank with PLP. With or without LDA.


GPU & CPU merge

  1. Investigate the possibility to merge GPU and CPU code.
Decision: CUDA code merged to CPU.

L-1 sparse initial training

  • Initial trial
  1. L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite.
  2. L-1=1e-6, the same results obtained, means le-6 is too small to be effective.
  3. L-1=1e-4, start from the first iteration. crashed. Need more investigation.
  • To be done
  1. Investigate other L-1 choice, starting from the scratch.

Kaldi/HTK merge

  • HTK2Kaldi: hold.
  • Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%.
  • Possibly the SP model issue, due to the complicated structure of silence in Kaldi.
  • To be done
Try other possible SP, e.g., duplicate the silence model, with a jump arck from the start to the end.
Try borrow SP from the HTK model

Embedded progress

  • Status:
  1. QA LM training, done.
  2. PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding (with an LM involving 90k words) is very slow (RT=7.0).


  • To be done
  1. substitute the LM with JSGF grammar involving 1000 words. Finish the initial test.
  2. Need to train a new AM.