“2014-11-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
patent
第4行: 第4行:
 
==== Environment ====
 
==== Environment ====
 
* Already buy 3 760GPU
 
* Already buy 3 760GPU
* grid-9 760GPU crashed again; random freeze after s ; try to investigate the reason
+
* grid-9 760GPU crashed again;
* GPU problems on grid-17?
+
* Change 760gpu card of grid-12 and grid-14
* disk (/work2) problem on grid-15
+
  
 
==== Sparse DNN ====
 
==== Sparse DNN ====
 
* Performance improvement found when pruned slightly
 
* Performance improvement found when pruned slightly
* need retraining for unpruned one; training loss  
+
* need retraining for unpruned one; training loss
* The result of AURORA 4 will be available soon.
+
 
* details at http://liuc.cslt.org/pages/sparse.html
 
* details at http://liuc.cslt.org/pages/sparse.html
  
第29行: 第27行:
  
 
* Drop out
 
* Drop out
:* dataset:wsj, testset:eval92
 
        std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.7_iter7(maxTr-Acc) | dropout0.8 | dropout0.8_iter7(maxTr-Acc)
 
    ------------------------------------------------------------------------------------------------------------------------------------
 
        4.5 |    5.39    |    4.80    |  4.75    |  4.36      |  4.39                      |    4.55    |    4.71         
 
:** Frame-accuarcy seems not consistent with WER. Using the train-data as cv, verify the learning ability of the model.
 
    Seems in one nnet model the train top frame accuracy is not consistent with the WER.
 
:** Decode test_clean_wv1 dataset. 
 
 
 
:* AURORA4 dataset
 
:* AURORA4 dataset
  
  (1) Train: train_nosiy
+
:* Use different proportion of noise data to investigate the effect of xEnt and mpe and dropout
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
+
:** Problem 1) The effect of dropout in different noise proportion;
    ---------------------------------------------------------------------------------------------------------
+
          2) The effect of MPE in different noise proportion;
           std-baseline          |  9.60          |  11.41          |  11.63          |  8.64
+
           3) The effect of MPE+dropout in different noise proportion.
    ---------------------------------------------------------------------------------------------------------
+
:**http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=261
              dp-0.3             |  12.91          |  16.55          |  15.37          |  12.60
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.4            |  11.48          |  14.43          |  13.23          |  11.04
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.5            |  10.53          |  13.00          |  12.89          |  10.24
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.6            |  10.02          |  12.32          |  11.81          |  9.29
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.7            |  9.65          |  12.01          |  12.09          |  8.89
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.8            |  9.79          |  12.01          |  11.77          |  8.91
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-1.0            |  9.94          |  11.33          |  12.05          |  8.32
+
    ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.4_lr0.008    |  9.52          |  12.01          |  11.75          |  9.44
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.4_lr0.0001    |  9.92          |  14.22          |  13.59          |  10.24
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.4_lr0.00001  |  9.06          |  13.27          |  13.14          |  9.33
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.8_lr0.008    |  9.16          |  11.23          |  11.42          |  8.49
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.8_lr0.0001    |  9.22          |  11.52          |  11.77          |  8.82
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.8_lr0.00001  |  9.12          |  11.27          |  11.65          |  8.68
+
  ---------------------------------------------------------------------------------------------------------
+
        dp-0.4_follow-std-lr    |  11.33          |  14.60          |  13.50          |  10.95
+
  ---------------------------------------------------------------------------------------------------------
+
        dp-0.8_follow-std-lr    |  9.77          |  12.01          |  11.79          |  8.93
+
  ---------------------------------------------------------------------------------------------------------
+
          dp-0.4_4-2048          |  11.69          |  16.13          |  14.24          |  11.98
+
  ---------------------------------------------------------------------------------------------------------
+
          dp-0.8_4-2048          |  9.46          |  11.60          |  11.98          |  8.78
+
  ---------------------------------------------------------------------------------------------------------
+
  
:** Test with AURORA4 of 7000 (clean + noisy).
+
:** Find and test unknown noise test-data.(++)
:** Follow the standard DNN training learn-rate to avoid the different learn-rate changing time of various DNN training. Similar performance is obtained.
+
:** Find and test unknown noise test-data.(+)
+
 
:** Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
 
:** Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
:** Draft the dropout-DNN weight distribution. (++)
+
:** Debug the low cv frame-accuracy
 
+
* Rectification
+
:* Combine drop out and rectifier.(+)
+
:* Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
+
  
 
* MaxOut
 
* MaxOut
第93行: 第43行:
 
  1) AURORA4 -15h
 
  1) AURORA4 -15h
 
     NOTE: gs==groupsize
 
     NOTE: gs==groupsize
  (1) Train: train_clean
+
:* pretraining based maxout
 +
:** Select units in Groupsize interval, but need low learn-rate
 +
:** Force accept the first iteration. Jump out from the local-minimum
 +
 
 +
* P-norm
 +
 
 
         model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1  
 
         model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1  
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04          |  29.91          |  27.76          |  16.37
+
        nnet_std-baseline       |  6.04          |  29.91          |  27.76          |  16.37
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          lr0.008_gs6         |                             -
+
        lr0.008-1e-7_gs6_p2    |  6.17          |  27.51          |  24.98         | 15.40
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          lr0.008_gs10         |                             -
+
        lr0.008-1e-7_gs10_p2    |  6.40          |  28.18          |  26.60         | 15.82
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          lr0.008_gs20         |                             -
+
        lr0.008-1e-7_gs10_p3    |  6.45          |  28.73          |  30.01         | 20.24
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
      lr0.008_l1-0.01         |                             -
+
        lr0.04-4e-3_gs6_p2      |  6.47          |  27.42          |  27.48         | 17.35
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
        lr0.008_l1-0.001        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
      lr0.008_l1-0.0001        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
    lr0.008_l1-0.000001        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
        lr0.008_l2-0.01        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
            lr0.006_gs10        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
            lr0.004_gs10        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.002_gs10        |  6.21          |  28.48          |  27.30          |  16.37
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs1          |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs2          |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs4          |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs6          |  6.04          |  25.17          |  24.31          |  14.19
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs8          |  5.85          |  25.72          |  24.35          |  14.28
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs10        |  6.23          |  27.04          |  25.51          |  14.22
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs15        |  5.94          |  30.10          |  27.53          |  19.00
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs20        |  6.32          |  28.10          |  26.47          |  16.98
 
    ---------------------------------------------------------------------------------------------------------
 
*: pretraining based maxout
 
* P-norm
 
 
  
 
* Convolutive network (+)
 
* Convolutive network (+)
第157行: 第79行:
 
  -----------------------------------------------------------------------------------------------------------------------
 
  -----------------------------------------------------------------------------------------------------------------------
  
:* READ paper
+
---------------------------------------------------------------------------------------------------------------------------------------------
 +
| %WER | Dnnhiddenlayers | hid-dim | pooling | CNN_unit |cnn_init_opts
 +
---------------------------------------------------------------------------------------------------------------------------------------------
 +
cnn_nonlda_std | 5.73 | 4 | 1200    | 3 |  |"--patch-dim1 8" input_dim ~ patch-dim1
 +
---------------------------------------------------------------------------------------------------------------------------------------------
 +
cnn_nonlda_cnnunit_384 | 5.85 | 4 | 1200    | 3 | 384 |"--patch-dim1 8 --num-filters2 384"  
 +
---------------------------------------------------------------------------------------------------------------------------------------------
 +
cnn_nonlda_cnnunit_220 | ----------    | 4 | 1200    | 3 | 220 |"--patch-dim1 8 --num-filters2 220"  
 +
---------------------------------------------------------------------------------------------------------------------------------------------
 +
 
 +
* MSE
 +
(1) AURORA4 (train_clean)
 +
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline_xent    |  6.04          |  29.91          |  27.76          |  16.37
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline_mse      |  6.05          |  31.30          |  30.03          |  15.77
 +
    ---------------------------------------------------------------------------------------------------------
 +
 
 +
* DAE(Deep Atuo-Encode)
 +
  (1) train_clean
 +
    drop-retention/testcase(WER)| test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
  ---------------------------------------------------------------------------------------------------------
 +
      std-xEnt-sigmoid-baseline| 6.04            |    29.91        |  27.76        | 16.37
 +
  ---------------------------------------------------------------------------------------------------------
 +
      std+dae_cmvn_noFT_2-1200 | 7.10            |    15.33        |  16.58        | 9.23
 +
  ---------------------------------------------------------------------------------------------------------
 +
    std+dae_cmvn_splice5_2-100  | 8.19            |    15.21        |  15.25        | 9.31
 +
  ---------------------------------------------------------------------------------------------------------
  
 
====Denoising & Farfield ASR====
 
====Denoising & Farfield ASR====
第165行: 第115行:
 
====VAD====
 
====VAD====
 
* Frame energy feature extraction, done
 
* Frame energy feature extraction, done
* Harmonics and Teager energy features being investigation
+
* Harmonics and Teager energy features being investigation (+)
* Previous results to be organized for a paper  
+
* Previous results to be organized for a paper  
 +
* MPE model VAD test
  
 
====Speech rate training====
 
====Speech rate training====
* Data ready on tencent set; some errors on speech rate dependent model.
+
* Data ready on tencent set; some errors on speech rate dependent model
* Retrain new model
+
* Retrain new model(+)
  
 
====Scoring====
 
====Scoring====
第176行: 第127行:
 
* harmonics based timber comparison: frequency based feature is better
 
* harmonics based timber comparison: frequency based feature is better
 
* GMM based timber comparison is done. Similar to speaker recognition
 
* GMM based timber comparison is done. Similar to speaker recognition
* TODO: Code checkin and technique report.
+
* TODO: Code checkin and '''technique report'''
  
 
====Confidence====
 
====Confidence====
第185行: 第136行:
 
===Speaker ID===
 
===Speaker ID===
 
* Preparing GMM-based server.
 
* Preparing GMM-based server.
* EER ~ 11.2% (GMM-based system)
+
* EER ~ 4% (GMM-based system)--Text independent
 +
* EER ~ 6%(1s) / 0.5%(5s) (GMM-based system)--Text dependent
 
* test different number of components; fast i-vector computing
 
* test different number of components; fast i-vector computing
  
第191行: 第143行:
 
* GMM-based language is ready.
 
* GMM-based language is ready.
 
* Delivered to Jietong
 
* Delivered to Jietong
 +
* Prepare the test-case
  
===Emotion detection===
+
===Voice Conversion===
 
+
* Yiye is reading materials
* Sinovoice is implementing the server
+
  
  

2014年11月24日 (一) 08:54的版本

Speech Processing

AM development

Environment

  • Already buy 3 760GPU
  • grid-9 760GPU crashed again;
  • Change 760gpu card of grid-12 and grid-14

Sparse DNN

RNN AM

  • Initial nnet seems not very well, need to be pre-trained or test lower learn-rate.
  • For AURORA 4 1h/epoch, model train done.
  • Using AURORA 4 short-sentence with a smaller number of targets.(+)
  • Adjusting the learning rate.(+)
  • Trying toolkit of Microsoft.(+)
  • details at http://liuc.cslt.org/pages/rnn.html

A new nnet training scheduler

Drop out & Rectification & convolutive network

  • Drop out
  • AURORA4 dataset
  • Use different proportion of noise data to investigate the effect of xEnt and mpe and dropout
    • Problem 1) The effect of dropout in different noise proportion;
          2) The effect of MPE in different noise proportion;
          3) The effect of MPE+dropout in different noise proportion.
    • Find and test unknown noise test-data.(++)
    • Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
    • Debug the low cv frame-accuracy
  • MaxOut
  • 6min/epoch
1) AURORA4 -15h
   NOTE: gs==groupsize
  • pretraining based maxout
    • Select units in Groupsize interval, but need low learn-rate
    • Force accept the first iteration. Jump out from the local-minimum
  • P-norm
        model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
       nnet_std-baseline       |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs6_p2     |  6.17           |  27.51           |  24.98          |  15.40 
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs10_p2    |  6.40           |  28.18           |  26.60          |  15.82 
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs10_p3    |  6.45           |  28.73           |  30.01          |  20.24 
   ---------------------------------------------------------------------------------------------------------
       lr0.04-4e-3_gs6_p2      |  6.47           |  27.42           |  27.48          |  17.35 
   ---------------------------------------------------------------------------------------------------------
  • Convolutive network (+)
  • AURORA 4
                 |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate	| pooling | TBA
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_baseline| 6.70 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1000_3  | 6.61 |     4      | 1000	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1400_3  | 6.61 |     4      | 1400	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_4  | 6.91 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   4     |patch-dim1 6 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_2  | -    |     4      | 1200	|      0      |    4   |   198   |   0.008	|   2     |patch-dim1 8 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_3  | 6.66 |     5      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------

| %WER | Dnnhiddenlayers | hid-dim | pooling | CNN_unit |cnn_init_opts


cnn_nonlda_std | 5.73 | 4 | 1200 | 3 | |"--patch-dim1 8" input_dim ~ patch-dim1


cnn_nonlda_cnnunit_384 | 5.85 | 4 | 1200 | 3 | 384 |"--patch-dim1 8 --num-filters2 384"


cnn_nonlda_cnnunit_220 | ---------- | 4 | 1200 | 3 | 220 |"--patch-dim1 8 --num-filters2 220"


  • MSE
(1) AURORA4 (train_clean)
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline_xent     |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          std-baseline_mse      |  6.05           |  31.30           |  30.03          |  15.77
   ---------------------------------------------------------------------------------------------------------
  • DAE(Deep Atuo-Encode)
 (1) train_clean
   drop-retention/testcase(WER)| test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
  ---------------------------------------------------------------------------------------------------------
      std-xEnt-sigmoid-baseline| 6.04            |    29.91         |   27.76         | 16.37
  ---------------------------------------------------------------------------------------------------------
      std+dae_cmvn_noFT_2-1200 | 7.10            |    15.33         |   16.58         | 9.23
  ---------------------------------------------------------------------------------------------------------
   std+dae_cmvn_splice5_2-100  | 8.19            |    15.21         |   15.25         | 9.31
  ---------------------------------------------------------------------------------------------------------

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Frame energy feature extraction, done
  • Harmonics and Teager energy features being investigation (+)
  • Previous results to be organized for a paper
  • MPE model VAD test

Speech rate training

  • Data ready on tencent set; some errors on speech rate dependent model
  • Retrain new model(+)

Scoring

  • Timber Comparison done.
  • harmonics based timber comparison: frequency based feature is better
  • GMM based timber comparison is done. Similar to speaker recognition
  • TODO: Code checkin and technique report

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 4% (GMM-based system)--Text independent
  • EER ~ 6%(1s) / 0.5%(5s) (GMM-based system)--Text dependent
  • test different number of components; fast i-vector computing

Language ID

  • GMM-based language is ready.
  • Delivered to Jietong
  • Prepare the test-case

Voice Conversion

  • Yiye is reading materials


Text Processing

LM development

Domain specific LM

  • domain lm(need to discuss with xiaoxi)
  • embedded language model(this week)
  • train some more LMs with Zhenlong (dianzishu sogou bbs chosen)("need result").
  • keep on training sogou2T lm(14/16 on 3rd iteration).(this week)
  • new dict.
  • handover of this work to hanzhenglong, give a simple docuemnt(this week)

tag LM

different weight
method tag-jsgf corpus weight wer ser add_wer
experiment 3 500(490 less frequent and 10 unseen) 500 0.1 16.72 77.92 -
0.3 15.42 71.25 -
0.5 15.40 69.58 -
0.7 15.28 68.75 -
0.8 15.38 68.33 -
1 15.98 69.17 -
2 19.08 70.83 -
experiment 4 100(90 less frequent and 10 unseen) 100 0.008 15.28 69.58 -
0.02 14.84 69.58 -
0.05 15.11 69.58 -
0.1 15.30 69.75 -
0.3 16.01 70.42 -
experiment 5 500 100 0.01 17.57 78.75 -
0.05 16.84 77.08 -
0.08 16.59 76.25 -
0.15 16.76 75.42 -
experiment 6 1280 500 0.1 17.42 77.92 -
0.5 15.20 69.17 -
0.8 15.30 68.33 -
1 15.69 69.58 -
  • conclusion:
 1. compare experiment 3  with experiment 5:
   same jsgf file, but the  tag number in corpus if different, we can find that when add 
 more tag to corpus, the optimal weight is larger.
 2. compare experiment 3 with experiment 6:
  same tag number in corpus, but different jsgf size, we can find that different jsgf size have the 
 same optimal weight.
  • need to do
  • tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (this week)
  • make a summary about tag-lm and journal paper(wxx and yuanb)(two weeks).

RNN LM

  • rnn
  • test wer RNNLM on Chinese data from jietong-data(this week)
  • check the rnnlm code about how to Initialize and update learning rate.
  • generate the ngram model from rnnlm and test the ppl with different size txt.(this week)
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.(hold)
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

Knowledge vector

  • Knowledge vector started
  • begin to code

Character to wordr

  • Character to word conversion(hold)
  • prepare the task: word similarity
  • prepare the dict.

Translation

  • v5.0 demo released
  • cut the dict and use new segment-tool

QA

deatil:[1]

Spell mistake

  • retrain the ngram model(caoli)

improve fuzzy match

  • add Synonyms similarity using MERT-4 method(hold)

improve lucene search

  • using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.(liurong this month)

Multi-Scene Recognition

  • handover to duxk(this week)

XiaoI framework

  • give a report about xiaoI framework
  • new inter will install SEMPRE

patent

  • GA-method improve the QA(this week)