“2013-04-19”版本间的差异

2013年4月26日 (五) 05:31的最后版本

Data sharing

AM/lexicon/LM are shared.
LM count files are still in transfering.

DNN progress

400 hour DNN training

Test Set	Tencent Baseline	bMMI	fMMI	BN(with fMMI)	Hybrid
1900	8.4	7.65	7.35	6.57	7.27
2044	22.4	24.44	24.03	21.77	20.24
online1	35.6	34.66	34.33	31.44	30.53
online2	29.6	27.23	26.80	24.10	23.89
map	24.5	27.54	27.69	23.79	22.46
notepad	16	19.81	21.75	15.81	12.74
general	36	38.52	38.90	33.61	31.55
speedup	26.8	27.88	26.81	22.82	22.00

Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
Our results are with 400 hour AM, 88k LM. ML+bMMI

Tencent test result

AM: 70h training data(2 day, 15 machines, 10 threads)

LM: 88k LM

Test case: general

gmmi-bmmi: 38.7%

dnn-1: 28% 11 frame window, phone-based tree

dnn-2: 34% 9 frame window, state-based tree

GPU & CPU merge

Invesigate the possibility to merge GPU and CPU code. Try to find out an easier way. (1 week)

L-1 sparse initial training

Start to investigating.

Kaldi/HTK merge

HTK2Kaldi: the tool with Kaldi does not work.
Kaldi2HTK: done with implementation. Testing?

Embedded progress

Some large performance (speed) degradation with the embedded platform(1/60).
Planning for sparse DNN.
QA LM training, still failed. Mengyuan need more work on this.

@@ 第1行： / 第1行： @@
-. Data sharing
+==Data sharing==
- (1) AM/lexicon/LM are shared.
+* AM/lexicon/LM are shared.
+* LM count files are still in transfering.
- (2) LM count files are still in transfering.
+==DNN progress==
+===400 hour DNN training===
+{| class="wikitable"
+!Test Set!! Tencent Baseline!! bMMI!! fMMI !! BN(with fMMI) !! Hybrid
+|-
+|1900||8.4  || 7.65 || 7.35||6.57 || 7.27
+|-
+|2044|| 22.4 ||24.44|| 24.03||21.77 || 20.24
+|-
+|online1||35.6 ||34.66||34.33||31.44 || 30.53
+|-
+|online2||29.6 ||27.23||26.80||24.10 || 23.89
+|-
+|map||24.5|| 27.54||27.69||23.79 || 22.46
+|-
+|notepad||16|| 19.81||21.75||15.81 || 12.74
+|-
+|general||36|| 38.52||38.90||33.61 || 31.55
+|-
+|speedup||26.8||27.88||26.81||22.82 || 22.00
+|-
+|}
+*Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
+*Our results are with 400 hour AM, 88k LM. ML+bMMI
-. DNN progress
+===Tencent test result===
-  (1) 400 hour BN model.
+:  AM: 70h training data(2 day, 15 machines, 10 threads)
+:  LM: 88k LM
+:  Test case: general
+:  gmmi-bmmi: 38.7%
+:  dnn-1: 28%  11 frame window,  phone-based tree
+:  dnn-2: 34%  9  frame window,  state-based tree
- (2) Tencent test result: 70h training data(2 day, 15 machines, 10 threads),
-k LM, general test case:
-gmmi-bmmi: 38.7%
+===GPU & CPU merge===
+:  Invesigate the possibility to merge GPU and CPU code. Try to find out an easier way. (1 week)
-dnn-1: 28%  11 frame window,  phone-based tree
+===L-1 sparse initial training===
+:  Start to investigating.
-dnn-2: 34%  9  frame window,  state-based tree
+==Kaldi/HTK merge==
+:* HTK2Kaldi: the tool with Kaldi does not work.
+:* Kaldi2HTK: done with implementation. Testing?
- (3) GPU & CPU merge. Invesigate the possibility to merge GPU and
+==Embedded progress==
-CPU code. Try to find out an easier way. (1 week)
+:* Some large performance (speed) degradation with the embedded platform(1/60).
+:* Planning for sparse DNN.
- (4) L-1 sparse initial training.
+:* QA LM training, still failed. Mengyuan need more work on this.
-.Kaldi/HTK merge
- (1) HTK2Kaldi: the tool with Kaldi does not work.
- (2) Kaldi2HTK: done with implementation. Testing?
-. Embedded progress
-(1). Some large performance (speed) degradation with the embedded platform(1/60).
-(2). Planning for sparse DNN.
-(3). QA LM training, still failed. Mengyuan need more work on this.