“2013-04-26”版本间的差异

2013年4月26日 (五) 07:14的最后版本

Data sharing

LM count files are still in transfering.

DNN progress

400 hour DNN training

Test Set	Tencent Baseline	bMMI	fMMI	BN(with fMMI)	Hybrid (DNN/HMM)
1900	8.4	7.65	7.35	6.57	7.27
2044	22.4	24.44	24.03	21.77	20.24
online1	35.6	34.66	34.33	31.44	30.53
online2	29.6	27.23	26.80	24.10	23.89
map	24.5	27.54	27.69	23.79	22.46
notepad	16	19.81	21.75	15.81	12.74
general	36	38.52	38.90	33.61	31.55
speedup	26.8	27.88	26.81	22.82	22.00

Note

Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon

Our results are with 400 hour AM, 88k LM. ML+bMMI.

The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.

The CSLT feature: MFCC+delta MFCC

To be done:

compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.

Tencent test result

AM: 70h training data

LM: 88k LM

Test case: general

param=700k

Feature	GMM	GMM-bMMI	DNN	DNN-MMI	DNN structure
PLP(-5,+5) [Eryu]	47	38.4	26.5	23.8	3001200120012001200]*1700
PLP+LDA+MLLT(-5,+5)[Jingbo]	47	-	34	-	300[1007100710071007]*3xxx

To be done:

CSLT: reproduce phone-clustered NN (Eryu's results)
CSLT: investigate performance of different epoches.
Tencent: feature comparison.
Tencent: FBank with PLP. With or without LDA.

GPU & CPU merge

Investigate the possibility to merge GPU and CPU code.

Decision: CUDA code merged to CPU.

L-1 sparse initial training

Initial trial

L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite.
L-1=1e-6, the same results obtained, means le-6 is too small to be effective.
L-1=1e-4, start from the first iteration. crashed. Need more investigation.

To be done

Investigate other L-1 choice, starting from the scratch.

Kaldi/HTK merge

HTK2Kaldi: hold.
Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%.
Possibly the SP model issue, due to the complicated structure of silence in Kaldi.
To be done

Try other possible SP, e.g., duplicate the silence model, with a jump arck from the start to the end.

Try borrow SP from the HTK model

Embedded progress

Status:

QA LM training, done.
PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding (with an LM involving 90k words) is very slow (RT=7.0).

To be done

substitute the LM with JSGF grammar involving 1000 words. Finish the initial test.
Need to train a new AM.

@@ 第1行： / 第1行： @@
 ==Data sharing==
-* AM/lexicon/LM are shared.
 * LM count files are still in transfering.
@@ 第7行： / 第6行： @@
 ===400 hour DNN training===
 {| class="wikitable"
-!Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN !! Hybrid
+!Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN(with fMMI) !! Hybrid (DNN/HMM)
 |-
-|1900||8.4  || 7.65 || 7.35||6.57
+|1900||8.4  || 7.65 || 7.35||6.57 || 7.27
 |-
-|2044|| 22.4 ||24.44|| 24.03||21.77
+|2044|| 22.4 ||24.44|| 24.03||21.77 || 20.24
 |-
-|online1||35.6 ||34.66||34.33||31.44
+|online1||35.6 ||34.66||34.33||31.44 || 30.53
 |-
-|online2||29.6 ||27.23||26.80||24.10
+|online2||29.6 ||27.23||26.80||24.10 || 23.89
 |-
-|map||24.5|| 27.54||27.69||23.79
+|map||24.5|| 27.54||27.69||23.79 || 22.46
 |-
-|notepad||16|| 19.81||21.75||15.81
+|notepad||16|| 19.81||21.75||15.81 || 12.74
 |-
-|general||36|| 38.52||38.90||33.61
+|general||36|| 38.52||38.90||33.61 || 31.55
 |-
-|speedup||26.8||27.88||26.81||22.82
+|speedup||26.8||27.88||26.81||22.82 || 22.00
 |-
 |}
-*Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
-*Our results are with 400 hour AM, 88k LM. ML+bMMI
+*Note
+:Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
+:Our results are with 400 hour AM, 88k LM. ML+bMMI.
+:The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.
+:The CSLT feature: MFCC+delta MFCC
+*To be done:
+:#compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.
 ===Tencent test result===
-:  AM: 70h training data(2 day, 15 machines, 10 threads)
+:AM: 70h training data
-:  LM: 88k LM
+:LM: 88k LM
-:  Test case: general
+:Test case: general
+:param=700k
-{class="wikitable"
+{|class="wikitable"
-!Feature !! GMM-bMMI !! DNN !! DNN-MMI
+!Feature !! GMM !!GMM-bMMI !! DNN !! DNN-MMI !! DNN structure
-|PLP(-5,+5) || 38.4   || 26.5 || 23.8 ||
 |-
-|PLP+LDA+MLLT(-5,+5)  || 38.4 ||28.7 ||
+|PLP(-5,+5) [Eryu]           || 47 || 38.4   || 26.5 || 23.8 ||300*1200*1200*1200*1200]*1700
+|-
+|PLP+LDA+MLLT(-5,+5)[Jingbo] || 47 || -|| 34 || - ||300*[1007*1007*1007*1007]*3xxx
 |-
 |}
+*To be done:
+:#CSLT: reproduce phone-clustered NN (Eryu's results)
+:#CSLT: investigate performance of different epoches.
+:#Tencent: feature comparison.
+:#Tencent: FBank with PLP. With or without LDA.
 ===GPU & CPU merge===
-:  Invesigate the possibility to merge GPU and CPU code. Try to find out an easier way. (1 week)
+# Investigate the possibility to merge GPU and CPU code.
+:Decision: CUDA code merged to CPU.
 ===L-1 sparse initial training===
-:  Start to investigating.
+*Initial trial
+:#  L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite.
+:#  L-1=1e-6, the same results obtained, means le-6 is too small to be effective.
+:#  L-1=1e-4, start from the first iteration. crashed. Need more investigation.
+*To be done
+:# Investigate other L-1 choice, starting from the scratch.
 ==Kaldi/HTK merge==
-:* HTK2Kaldi: the tool with Kaldi does not work.
+* HTK2Kaldi: hold.
-:* Kaldi2HTK: done with implementation. Testing?
+* Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%.
+* Possibly the SP model issue, due to the complicated structure of silence in Kaldi.
+* To be done
+:Try other possible SP, e.g., duplicate the silence model, with a jump arck from the start to the end.
+:Try borrow SP from the HTK model
 ==Embedded progress==
-:* Some large performance (speed) degradation with the embedded platform(1/60).
+*Status:
-:* Planning for sparse DNN.
+:# QA LM training, done.
-:* QA LM training, still failed. Mengyuan need more work on this.
+:# PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding (with an LM involving 90k words) is very slow (RT=7.0).
+*To be done
+:# substitute the LM with JSGF grammar involving 1000 words. Finish the initial test.
+:# Need to train a new AM.