“2013-05-24”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Tencent exps
 
(2位用户的19个中间修订版本未显示)
第5行: 第5行:
 
== DNN progress ==
 
== DNN progress ==
 
=== Experiments ===
 
=== Experiments ===
* sparse DNN
+
* sparse DNN: sticky training (retrain the nnet while keeping the sparsness)
  
zero small values(WER 1900):
+
zero small values(test set: 1900):
  
threshold      0     0.01     0.03     0.05     0.08     0.1
+
{| class="wikitable"
shrinkage     0.0   4.6%    13.5%     21.8%    33.4%  40.5%
+
!threshold!!     0 !!    0.01 !!    0.03 !!    0.05   !!  0.08 !!    0.1 !! 0.2 !!0.3
performance   7.25% 7.21%   7.28%    7.41%    7.61%   7.67%
+
|-
 
+
|shrinkage%  ||  0.0   || 4.3  ||  12.7  ||  20.9  ||  32.5  || 39.5    || 66.4 || 81.6
 
+
|-
* fixed-point DNN
+
|without sticky: WER  || 7.55  ||7.60 ||   7.62 ||  7.66 ||  7.72 || 7.87 || 9.46 || 53.23
 +
|-
 +
|with   sticky: WER  || 7.55  ||7.57 ||  7.60 ||   7.60 ||  7.63 ||  7.64 || 8.35 ||  9.51 
 +
|-
 +
|}
  
ORG      WER(1900) 7.25%
+
The conclusion is that with the L2 retrain, the DNN performance is largely called back. The extremely sparse case (th0.3) with sticky training seems quite amazing. This means the network could be sparse. However this is just for the 1900 test. Need test on other sets.
  
val=-math.log(abs(vv)/1000.0)*20
+
* fixed-point DNN forwarding
  
WER(1900):  7.30%
+
According to the fixed-point FST and NN, and the results of the sparse NN, we are working on fast NN decoder which is suitable for embedded device. The work is just started.
 
+
* fixed-pint HCLG
+
 
+
ORG      WER(1900) 7.25%
+
 
+
INT 50  WER(1900) 7.30%
+
 
+
INT 10  WER(1900) 7.12%
+
  
 
=== Tencent exps ===
 
=== Tencent exps ===
  
 +
本周1000小时实验已结束,实验性能如下:
  
1:1000小时训练DNN模型,同时跑2个有关学习率的实验。一个learning rate指数下降,一个采用newbob的方式。实验接近尾声,下周前可以全部结束实验。对比效果后,采用较好的学习率递减方式,训练更大规模数据的dnn模型。
+
{| class="wikitable"
 
+
!!!  old baseline !!  new baseline !!  DNN
[[we are looking forward to the 1000 hour results..]]
+
|-
 +
|1900  ||  8.4 || 6.8 || 4.3
 +
|-
 +
|2044  ||  22.4  || 15.7  || 12.7
 +
|-
 +
|online1 || 35.6 ||32.7 || 25.8
 +
|-
 +
|online2 || 29.6 ||27.3 || 22.1
 +
|-
 +
|map || 24.5 || 15.8 || 13.4
 +
|-
 +
|notepad || 16 || 8.1 || 5.6
 +
|-
 +
|general || 36 || 25.1 || 19.3
 +
|-
 +
|speedup || 26.8 || 14 || -
 +
|-
 +
|}
  
2:解码器端尝试了sse,定点化等加速优化策略,仍不能再高并发的环境下,将实时率降到1以下,直接在测试端采用low-rank matrix approximations,测试性能衰减较多。训练段使用这种方法,公式有待推导。
+
接下来计划:
 
+
*6000小时模型训练,dnn模型相关其他技术(序列化dt,alignment,pretrain)
[[we probably need to rely on the sparse net solution plus fix point computing. ]]
+
 
+
待验证工作:
+
1:pretrain的2种策略:rbm和discriminative pretrain方法。
+
 
+
[[MS suggested the latter, while the performance difference for large networks (more than 7 layers) is not significant according to the publications. For large data, it deserves to try, though the rbm approach is highly costly. ]]
+
 
+
2:hmm-dnn训练之后,使用hmm-dnn模型alignment,更新转移概率之后,重新训练hmm-dnn性能。
+
 
+
[[should be promising]]
+
 
+
3:hmm-dnn+sequential dt训练性能提升比例。
+
 
+
4:dnn训练端采用low-rank的方式。
+
 
+
(the low-rank is a bit strange to me, it does not related to a reason  objective function directly, and the structure of the weight matrix is nothing to do with the objective.)
+
  
 
=== GPU & CPU merge ===
 
=== GPU & CPU merge ===
  
#just started
+
#on progress.
  
  
第64行: 第62行:
  
 
* HTK2Kaldi: hold.
 
* HTK2Kaldi: hold.
* Kaldi2HTK: pdf error problem.
+
* Kaldi2HTK: hold and second priority
Kaldi Monophone: 30.91%  HDecode: 41.40%
+
 
* workaround; use the BN feature to train HTK models, so without kaldi training.
+
The above work is probably not very necessary since Tencent will fully migrate to the hybrid DNN approach, and therefore HTK will be never used.
  
 
== Embedded progress ==
 
== Embedded progress ==
 +
 
*Status:
 
*Status:
:# first embedded demo done, 1000 words take 3.2M memory.
+
: check the reference, and change the compiling options
:# accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.  
+
: the large-scale AM training based on the Tencent 400h data is done.
:# training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
+
: the random output problem is fixed.
  
 
{| class="wikitable"
 
{| class="wikitable"
! Test Set !! #utt !! ERR !! RT
+
! Test Set !! #utt !! PS default !! Tencent
 
|-
 
|-
|   || 806 || 23.33 || 0.07
+
| cw  || 993 || 8.01(RT: 0.07) || 7.61(RT: 0.40)
 
|-
 
|-
|   || 887 || 13.64 || 0.08
+
| hfc || 986 || 6.69(RT: 0.07) || 5.48(RT: 0.40)
 
|-  
 
|-  
|   || 876 || 17.58 || 0.07
+
| zz  || 984 || 12.73(RT: 0.07) || 5.91(RT: 0.40)
 
|-
 
|-
 
|}
 
|}
 +
  
 
*To be done
 
*To be done
:# finish the large scale AM training
+
:# large scale parallel training.
 +
:# NN based engine(dynamic and static).

2013年6月1日 (六) 14:08的最后版本

Data sharing

  • LM count files still undelivered!

DNN progress

Experiments

  • sparse DNN: sticky training (retrain the nnet while keeping the sparsness)

zero small values(test set: 1900):

threshold 0 0.01 0.03 0.05 0.08 0.1 0.2 0.3
shrinkage% 0.0 4.3 12.7 20.9 32.5 39.5 66.4 81.6
without sticky: WER 7.55 7.60 7.62 7.66 7.72 7.87 9.46 53.23
with sticky: WER 7.55 7.57 7.60 7.60 7.63 7.64 8.35 9.51

The conclusion is that with the L2 retrain, the DNN performance is largely called back. The extremely sparse case (th0.3) with sticky training seems quite amazing. This means the network could be sparse. However this is just for the 1900 test. Need test on other sets.

  • fixed-point DNN forwarding

According to the fixed-point FST and NN, and the results of the sparse NN, we are working on fast NN decoder which is suitable for embedded device. The work is just started.

Tencent exps

本周1000小时实验已结束,实验性能如下:

old baseline new baseline DNN
1900 8.4 6.8 4.3
2044 22.4 15.7 12.7
online1 35.6 32.7 25.8
online2 29.6 27.3 22.1
map 24.5 15.8 13.4
notepad 16 8.1 5.6
general 36 25.1 19.3
speedup 26.8 14 -

接下来计划:

  • 6000小时模型训练,dnn模型相关其他技术(序列化dt,alignment,pretrain)

GPU & CPU merge

  1. on progress.


Kaldi/HTK merge

  • HTK2Kaldi: hold.
  • Kaldi2HTK: hold and second priority

The above work is probably not very necessary since Tencent will fully migrate to the hybrid DNN approach, and therefore HTK will be never used.

Embedded progress

  • Status:
check the reference, and change the compiling options
the large-scale AM training based on the Tencent 400h data is done.
the random output problem is fixed.
Test Set #utt PS default Tencent
cw 993 8.01(RT: 0.07) 7.61(RT: 0.40)
hfc 986 6.69(RT: 0.07) 5.48(RT: 0.40)
zz 984 12.73(RT: 0.07) 5.91(RT: 0.40)


  • To be done
  1. large scale parallel training.
  2. NN based engine(dynamic and static).