Xinsong-beamforming-result
来自cslt Wiki
far field SNR : 14dB restaurant training data reorded in 2016.4.22 has 250 sentences and test data has 11 sentences. 16k and 16bit data
AM: 10000h 7*2048 MPE: LM: 1e-7 5-gram
ID3: test data which are involved in the DAE training
baseline:
---------------------------------------------------
| test random wer | test ID3 wer(within trainingset)
---------------------------------------------------
c1 | 39.74 | 36.89
---------------------------------------------------
c2 | 30.77 | 34.95
---------------------------------------------------
c3 | 34.62 | 37.38
---------------------------------------------------
c4 | 42.31 | 36.89
---------------------------------------------------
near | 21.79 | 4.37
---------------------------------------------------
beamforming:
---------------------------------------------------
DS_post | 26.92 | 29.13
----------------------------------------------------
SD_post | 23.08 | 26.70 (second)
----------------------------------------------------
MVDR_post | 26.92 | 28.64
----------------------------------------------------
sino beamforming|
_from_xiaoming | 26.92 | 31.07
----------------------------------------------------
Four-channel cnn_tdnn model: 160 fbank
----------------------------------------------------
tdnn_dae_1*1024| 38.46 | 10.19
_tdnn_lr_0.008 |
-----------------------------------------------------
cnn_tdnn_daa_ |
1*128_cnn_1*1024| 39.74 | 5.34
_tdnn_lr_0.008 |
-----------------------------------------------------
cnn_tdnn_dae_ |
1*128_cnn_1*512 | 48.72 | 7.28
_tdnn_lr_0.008 |
-----------------------------------------------------
cnn_tdnn_dae_ |
1*64_cnn_1*1024 | 39.74 | 7.28
_tdnn_lr_0.008 |
-----------------------------------------------------
cnn_tdnn_dae_ |
2*64_cnn_1*1024 | 50.00 | 25.24
_tdnn_lr_0.008 |
-----------------------------------------------------
cnn_tdnn_dae_ |
2*64_cnn_1*1024 | 44.87 | 23.30
_tdnn_lr_0.008_ |
nopooling |
-----------------------------------------------------
cnn_tdnn_dae_ |
1*32_cnn_1*1024 | 50.00 | 8.25
_tdnn_lr_0.008 |
------------------------------------------------------
cnn_tdnn_dae_ |
2*32_cnn_1*1024 | 46.15 | 33.98
_tdnn_lr_0.008 |
------------------------------------------------------
cnn_tdnn_dae_ |
2*32_cnn_1*1024 | 47.44 | 28.16
_tdnn_lr_0.008_ |
nopooling |
------------------------------------------------------
cnn_tdnn_dae_ |
1*16_cnn_1*1024 | 46.15 | 8.25
_tdnn_lr_0.008 |
-------------------------------------------------------
cnn_tdnn_dae_ |
2*16_cnn_1*1024 | 62.82 | 17.96
_tdnn_lr_0.008 |
-------------------------------------------------
cnn_tdnn_dae_ |
2*16_cnn_1*1024 | 50.00 | 22.82
_tdnn_lr_0.008_ |
nopooling |
-------------------------------------------------
cnn_tdnn_dae_ |
2*16_cnn_1*256 | 48.72 | 16.99
_tdnn_lr_0.008 |
-------------------------------------------------
cnn_tdnn_dae_ |
2*16_cnn_1*256 | 48.72 | 19.90
_tdnn_lr_0.008_ |
nopooling |
-------------------------------------------------
Single channel cnn_tdnn: 40 Fbanks
------------------------------------------------------
cnn_tdnn_mix_dae_|
1*128_cnn_1*1024 | 25.64 | 9.22
_tdnn_lr_0.008_c1 |
------------------------------------------------------
cnn_tdnn_mix_dae_|
1*128_cnn_1*1024 | 28.21 | 7.28
_tdnn_lr_0.008_c2 |
------------------------------------------------------
cnn_tdnn_mix_dae_ |
1*128_cnn_1*1024 | 20.51 | 4.37 (first)
_tdnn_lr_0.008_c3 |
------------------------------------------------------
cnn_tdnn_mix_dae_ |
1*128_cnn_1*1024 | 21.79 | 5.34
_tdnn_lr_0.008_c4 |
-------------------------------------------------------