“2014-11-10”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Text Processing== ===LM development=== ====Domain specific LM==== * domain lm :* weibo lm with pruning 0 10 10 20 20 testing done. weibo lm with pruning 0 10 8 8...”为内容创建页面)
 
第1行: 第1行:
 +
==Speech Processing ==
 +
=== AM development ===
 +
 +
==== Environment ====
 +
* Already buy 3 760GPU
 +
* grid-9 760GPU crashed again
 +
 +
==== Sparse DNN ====
 +
* Performance improvement found when pruned slightly
 +
* Waiting for result of AURORA 4
 +
* HOLD
 +
 +
==== RNN AM====
 +
* Initial nnet seems no very well, need to be pre-trained or test lower learn-rate.
 +
* For AURORA 4 1h/epoch, model train done.
 +
* Using AURORA 4 short-sentence with a smaller number of targets.
 +
* Adjusting the learning rate.
 +
* Trying toolkit of Microsoft.
 +
 +
====Noise training====
 +
* Paper has been submitted.
 +
 +
====Drop out & Rectification & convolutive network====
 +
 +
* Drop out
 +
:* dataset:wsj, testset:eval92
 +
        std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.7_iter7(maxTr-Acc) | dropout0.8 | dropout0.8_iter7(maxTr-Acc)
 +
    ------------------------------------------------------------------------------------------------------------------------------------
 +
        4.5 |    5.39    |    4.80    |  4.75    |  4.36      |  4.39                      |    4.55    |    4.71         
 +
:** Frame-accuarcy seems not consistent with WER. Using the train-data as cv, verify the learning ability of the model.
 +
    Seems in one nnet model the train top frame accuracy is not consistent with the WER.
 +
:** Decode test_clean_wv1 dataset. 
 +
 +
:* AURORA4 dataset
 +
 +
  (1) Train: train_nosiy
 +
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline          |  9.60          |  11.41          |  11.63          |  8.64
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.3            |  12.91          |  16.55          |  15.37          |  12.60
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.4            |  11.48          |  14.43          |  13.23          |  11.04
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.5            |  10.53          |  13.00          |  12.89          |  10.24
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.6            |  10.02          |  12.32          |  11.81          |  9.29
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.7            |  9.65          |  12.01          |  12.09          |  8.89
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.8            |  9.79          |  12.01          |  11.77          |  8.91
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-1.0            |  9.94          |  11.33          |  12.05          |  8.32
 +
    ---------------------------------------------------------------------------------------------------------
 +
      baseline_dp0.4_lr0.008    |  9.52          |  12.01          |  11.75          |  9.44
 +
  ---------------------------------------------------------------------------------------------------------
 +
      baseline_dp0.4_lr0.0001    |  9.92          |  14.22          |  13.59          |  10.24
 +
  ---------------------------------------------------------------------------------------------------------
 +
      baseline_dp0.4_lr0.00001  |  9.06          |  13.27          |  13.14          |  9.33
 +
  ---------------------------------------------------------------------------------------------------------
 +
      baseline_dp0.8_lr0.008    |  9.16          |  11.23          |  11.42          |  8.49
 +
  ---------------------------------------------------------------------------------------------------------
 +
      baseline_dp0.8_lr0.0001    |  9.22          |  11.52          |  11.77          |  8.82
 +
  ---------------------------------------------------------------------------------------------------------
 +
      baseline_dp0.8_lr0.00001  |  9.12          |  11.27          |  11.65          |  8.68
 +
  ---------------------------------------------------------------------------------------------------------
 +
        dp-0.4_follow-std-lr    |  11.33          |  14.60          |  13.50          |  10.95
 +
  ---------------------------------------------------------------------------------------------------------
 +
        dp-0.8_follow-std-lr    |  9.77          |  12.01          |  11.79          |  8.93
 +
  ---------------------------------------------------------------------------------------------------------
 +
          dp-0.4_4-2048          |  11.69          |  16.13          |  14.24          |  11.98
 +
  ---------------------------------------------------------------------------------------------------------
 +
          dp-0.8_4-2048          |  9.46          |  11.60          |  11.98          |  8.78
 +
  ---------------------------------------------------------------------------------------------------------
 +
 +
:** Test with AURORA4 of 7000 (clean + noisy).
 +
:** Follow the standard DNN training learn-rate to avoid the different learn-rate changing time of various DNN training. Similar performance is obtained.
 +
:** Find and test unknown noise test-data.(+)
 +
:** Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
 +
:** Draft the dropout-DNN weight distribution. (++)
 +
 +
* Rectification
 +
:* 1) AURORA 4 -15h
 +
  (1) Train: train_clean
 +
      learn-rate/testcase(WER)  | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline        |  6.04          |  29.91          |  27.76          |  16.37
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.00001            |  8.30          |  43.85          |  46.42          |  29.80
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.0001            |  6.57          |  31.11          |  30.65          |  19.65
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.0006            |  6.19          |  29.23          |  28.45          |  17.31
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.0008            |  6.17          |  28.10          |  27.46          |  14.97
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001              |  6.28          |  30.01          |  30.26          |  20.81
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.003              |  6.44          |  32.01          |  32.24          |  17.82
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.005              |  6.47          |  33.49          |  34.75          |  18.15
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.007              |  6.72          |  35.85          |  39.72          |  18.03
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.001      |  83.19          |  98.57          |  98.84          |  97.77
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.0001    |  7.58          |  32.94          |  34.29          |  23.42
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.00001    |  6.21          |  29.15          |  28.24          |  19.50
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.000001    |  6.30          |  31.91          |  29.23          |  21.52
 +
    ---------------------------------------------------------------------------------------------------------
 +
 +
:* Combine drop out and rectifier.
 +
:* Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
 +
 +
* MaxOut
 +
:* 6min/epoch
 +
1) AURORA4 -15h
 +
 +
    NOTE: gs==groupsize
 +
  (1) Train: train_clean
 +
        model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline        |  6.04          |  29.91          |  27.76          |  16.37
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.008_gs6          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.008_gs10          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.008_gs20          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
      lr0.008_l1-0.01          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr0.008_l1-0.001        |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
      lr0.008_l1-0.0001        |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
    lr0.008_l1-0.000001        |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr0.008_l2-0.01        |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
            lr0.006_gs10        |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
            lr0.004_gs10        |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.002_gs10        |  6.21          |  28.48          |  27.30          |  16.37
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001_gs1          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001_gs2          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001_gs4          |                            -
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001_gs6          |  6.04          |  25.17          |  24.31          |  14.19
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001_gs8          |  5.85          |  25.72          |  24.35          |  14.28
 +
    ---------------------------------------------------------------------------------------------------------
 +
  lr0.001_gs10        |  6.23          |  27.04          |  25.51          |  14.22
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001_gs15        |  5.94          |  30.10          |  27.53          |  19.00
 +
    ---------------------------------------------------------------------------------------------------------
 +
  lr0.001_gs20        |  6.32          |  28.10          |  26.47          |  16.98
 +
    ---------------------------------------------------------------------------------------------------------
 +
 +
* P-norm
 +
 +
* Convolutive network (+)
 +
:* AURORA 4
 +
                  |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate | pooling | TBA
 +
-----------------------------------------------------------------------------------------------------------------------
 +
  cnn_std_baseline| 6.70 |    4      | 1200 |      0      |    4  |  198  |  0.008 |  3    |patch-dim1 7
 +
-----------------------------------------------------------------------------------------------------------------------
 +
  cnn_std_1000_3  | 6.61 |    4      | 1000 |      0      |    4  |  198  |  0.008 |  3    |patch-dim1 7
 +
-----------------------------------------------------------------------------------------------------------------------
 +
  cnn_std_1400_3  | 6.61 |    4      | 1400 |      0      |    4  |  198  |  0.008 |  3    |patch-dim1 7
 +
-----------------------------------------------------------------------------------------------------------------------
 +
  cnn_std_1200_4  | 6.91 |    4      | 1200 |      0      |    4  |  198  |  0.008 |  4    |patch-dim1 6
 +
-----------------------------------------------------------------------------------------------------------------------
 +
  cnn_std_1200_2  | -    |    4      | 1200 |      0      |    4  |  198  |  0.008 |  2    |patch-dim1 8
 +
-----------------------------------------------------------------------------------------------------------------------
 +
  cnn_std_1200_3  | 6.66 |    5      | 1200 |      0      |    4  |  198  |  0.008 |  3    |patch-dim1 7
 +
-----------------------------------------------------------------------------------------------------------------------
 +
 +
:* READ paper
 +
 +
====Denoising & Farfield ASR====
 +
* ICASSP paper submitted.
 +
* HOLD
 +
 +
====VAD====
 +
* Frame energy feature extraction, done
 +
* Harmonics and Teager energy features being investigation
 +
* Previous results to be organized for a paper 
 +
 +
====Speech rate training====
 +
* 100h random select from 1000h tec dataset
 +
:* baseline and ROS NNet train done, will decoding soon
 +
* Seems ROS model is superior to the normal one with faster speech
 +
 +
==== low resource language AM training ====
 +
* HOLD
 +
* Uyghur language model has been released to JT. Done.
 +
 +
====Scoring====
 +
* Timber Comparison on testing
 +
 +
====Confidence====
 +
* Reproduce the experiments on fisher dataset.
 +
* Use the fisher DNN model to decode all-wsj dataset
 +
* preparing scoring for puqiang data
 +
 +
===Speaker ID===
 +
* Preparing GMM-based server.
 +
* EER ~ 11.2% (GMM-based system)
 +
* test different number of components; fast i-vector computing
 +
 +
===Language ID===
 +
* GMM-based language is ready.
 +
* Delivered to Jietong
 +
 +
===Emotion detection===
 +
 +
* Sinovoice is implementing the server
 +
 +
 
==Text Processing==
 
==Text Processing==
 
===LM development===
 
===LM development===

2014年11月10日 (一) 09:56的版本

Speech Processing

AM development

Environment

  • Already buy 3 760GPU
  • grid-9 760GPU crashed again

Sparse DNN

  • Performance improvement found when pruned slightly
  • Waiting for result of AURORA 4
  • HOLD

RNN AM

  • Initial nnet seems no very well, need to be pre-trained or test lower learn-rate.
  • For AURORA 4 1h/epoch, model train done.
  • Using AURORA 4 short-sentence with a smaller number of targets.
  • Adjusting the learning rate.
  • Trying toolkit of Microsoft.

Noise training

  • Paper has been submitted.

Drop out & Rectification & convolutive network

  • Drop out
  • dataset:wsj, testset:eval92
       std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.7_iter7(maxTr-Acc) | dropout0.8 | dropout0.8_iter7(maxTr-Acc)
    ------------------------------------------------------------------------------------------------------------------------------------ 
       4.5 |     5.39    |    4.80    |   4.75     |  4.36      |  4.39                       |    4.55    |    4.71           
    • Frame-accuarcy seems not consistent with WER. Using the train-data as cv, verify the learning ability of the model.
   Seems in one nnet model the train top frame accuracy is not consistent with the WER. 
    • Decode test_clean_wv1 dataset.
  • AURORA4 dataset
  (1) Train: train_nosiy
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline          |  9.60           |  11.41           |  11.63          |  8.64
   ---------------------------------------------------------------------------------------------------------
             dp-0.3             |  12.91          |  16.55           |  15.37          |  12.60
   ---------------------------------------------------------------------------------------------------------
             dp-0.4             |  11.48          |  14.43           |  13.23          |  11.04
   ---------------------------------------------------------------------------------------------------------
             dp-0.5             |  10.53          |  13.00           |  12.89          |  10.24
   ---------------------------------------------------------------------------------------------------------
             dp-0.6             |  10.02          |  12.32           |  11.81          |  9.29
   ---------------------------------------------------------------------------------------------------------
             dp-0.7             |  9.65           |  12.01           |  12.09          |  8.89
   ---------------------------------------------------------------------------------------------------------
             dp-0.8             |  9.79           |  12.01           |  11.77          |  8.91
   ---------------------------------------------------------------------------------------------------------
             dp-1.0             |  9.94           |  11.33           |  12.05          |  8.32
   ---------------------------------------------------------------------------------------------------------
     baseline_dp0.4_lr0.008     |  9.52           |  12.01           |  11.75          |  9.44
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.4_lr0.0001    |  9.92           |  14.22           |  13.59          |  10.24
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.4_lr0.00001   |  9.06           |  13.27           |  13.14          |  9.33
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.8_lr0.008     |  9.16           |  11.23           |  11.42          |  8.49
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.8_lr0.0001    |  9.22           |  11.52           |  11.77          |  8.82
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.8_lr0.00001   |  9.12           |  11.27           |  11.65          |  8.68
  ---------------------------------------------------------------------------------------------------------
       dp-0.4_follow-std-lr     |  11.33          |  14.60           |  13.50          |  10.95
  ---------------------------------------------------------------------------------------------------------
       dp-0.8_follow-std-lr     |  9.77           |  12.01           |  11.79          |  8.93
  ---------------------------------------------------------------------------------------------------------
         dp-0.4_4-2048          |  11.69          |  16.13           |  14.24          |  11.98
  ---------------------------------------------------------------------------------------------------------
         dp-0.8_4-2048          |  9.46           |  11.60           |  11.98          |  8.78
  ---------------------------------------------------------------------------------------------------------
    • Test with AURORA4 of 7000 (clean + noisy).
    • Follow the standard DNN training learn-rate to avoid the different learn-rate changing time of various DNN training. Similar performance is obtained.
    • Find and test unknown noise test-data.(+)
    • Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
    • Draft the dropout-DNN weight distribution. (++)
  • Rectification
  • 1) AURORA 4 -15h
 (1) Train: train_clean
     learn-rate/testcase(WER)  | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.00001            |  8.30           |  43.85           |  46.42          |  29.80
   ---------------------------------------------------------------------------------------------------------
          lr0.0001             |  6.57           |  31.11           |  30.65          |  19.65
   ---------------------------------------------------------------------------------------------------------
          lr0.0006             |  6.19           |  29.23           |  28.45          |  17.31
   ---------------------------------------------------------------------------------------------------------
          lr0.0008             |  6.17           |  28.10           |  27.46          |  14.97
   ---------------------------------------------------------------------------------------------------------
          lr0.001              |  6.28           |  30.01           |  30.26          |  20.81
   ---------------------------------------------------------------------------------------------------------
          lr0.003              |  6.44           |  32.01           |  32.24          |  17.82
   ---------------------------------------------------------------------------------------------------------
          lr0.005              |  6.47           |  33.49           |  34.75          |  18.15
   ---------------------------------------------------------------------------------------------------------
          lr0.007              |  6.72           |  35.85           |  39.72          |  18.03
   ---------------------------------------------------------------------------------------------------------
        lr-0.001_l1-0.001      |  83.19          |  98.57           |  98.84          |  97.77
   ---------------------------------------------------------------------------------------------------------
        lr-0.001_l1-0.0001     |  7.58           |  32.94           |  34.29          |  23.42
   ---------------------------------------------------------------------------------------------------------
       lr-0.001_l1-0.00001     |  6.21           |  29.15           |  28.24          |  19.50
   ---------------------------------------------------------------------------------------------------------
       lr-0.001_l1-0.000001    |  6.30           |  31.91           |  29.23          |  21.52
   ---------------------------------------------------------------------------------------------------------
  • Combine drop out and rectifier.
  • Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
  • MaxOut
  • 6min/epoch
1) AURORA4 -15h
   NOTE: gs==groupsize
 (1) Train: train_clean
        model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.008_gs6          |                             - 
   ---------------------------------------------------------------------------------------------------------
         lr0.008_gs10          |                             - 
   ---------------------------------------------------------------------------------------------------------
         lr0.008_gs20          |                             - 
   ---------------------------------------------------------------------------------------------------------
      lr0.008_l1-0.01          |                             - 
   ---------------------------------------------------------------------------------------------------------
       lr0.008_l1-0.001        |                             - 
   ---------------------------------------------------------------------------------------------------------
      lr0.008_l1-0.0001        |                             - 
   ---------------------------------------------------------------------------------------------------------
    lr0.008_l1-0.000001        |                             - 
   ---------------------------------------------------------------------------------------------------------
        lr0.008_l2-0.01        |                             - 
   ---------------------------------------------------------------------------------------------------------
           lr0.006_gs10        |                             - 
   ---------------------------------------------------------------------------------------------------------
           lr0.004_gs10        |                             - 
   ---------------------------------------------------------------------------------------------------------
          lr0.002_gs10         |  6.21           |  28.48           |  27.30          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs1          |                             -
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs2          |                             -
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs4          |                             -
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs6          |  6.04           |  25.17           |  24.31          |  14.19
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs8          |  5.85           |  25.72           |  24.35          |  14.28
   ---------------------------------------------------------------------------------------------------------

lr0.001_gs10 | 6.23 | 27.04 | 25.51 | 14.22

   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs15         |  5.94           |  30.10           |  27.53          |  19.00
   ---------------------------------------------------------------------------------------------------------

lr0.001_gs20 | 6.32 | 28.10 | 26.47 | 16.98

   ---------------------------------------------------------------------------------------------------------
  • P-norm
  • Convolutive network (+)
  • AURORA 4
                 |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate	| pooling | TBA
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_baseline| 6.70 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1000_3  | 6.61 |     4      | 1000	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1400_3  | 6.61 |     4      | 1400	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_4  | 6.91 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   4     |patch-dim1 6 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_2  | -    |     4      | 1200	|      0      |    4   |   198   |   0.008	|   2     |patch-dim1 8 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_3  | 6.66 |     5      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
  • READ paper

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Frame energy feature extraction, done
  • Harmonics and Teager energy features being investigation
  • Previous results to be organized for a paper

Speech rate training

  • 100h random select from 1000h tec dataset
  • baseline and ROS NNet train done, will decoding soon
  • Seems ROS model is superior to the normal one with faster speech

low resource language AM training

  • HOLD
  • Uyghur language model has been released to JT. Done.

Scoring

  • Timber Comparison on testing

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 11.2% (GMM-based system)
  • test different number of components; fast i-vector computing

Language ID

  • GMM-based language is ready.
  • Delivered to Jietong

Emotion detection

  • Sinovoice is implementing the server


Text Processing

LM development

Domain specific LM

  • domain lm
  • weibo lm with pruning 0 10 10 20 20 testing done. weibo lm with pruning 0 10 8 8 8 under testing. weibo lm without pruning 4/8 done.
  • merger weibo、baiduhi and baiduzhidao lm and test (this week)
  • confirm the size of alpa with xiaomin for business application.(like e-13)
  • get the general test data from miaomin .this test set may get from online.
  • new dict.
  • Tested the earlier vocabulary on 6000.txt with ppl.
               old150K      new166K      new150K
   baiduzhidao     394          369          333
   baiduhi         217          190          188
  • Built new 100K,150K,200K vocabulary
  • Had fixed some bugs in sogou dict spider.
  • new toolkit:find method to update the new dict. can get new wordlist from sougou and get word information from baidu.(two week)

tag LM

  • set new test
  • result


RNN LM

  • rnn
  • RNNLM=>ALPA make a report
  • test RNNLM on Chinese data from jietong-data
  • check the rnnlm code.
  • lstm+rnn
  • check the lstm-rnnlm code

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • SSA-based local linear mapping still on running.
  • k-means classes change to 2.
  • Knowledge vector started
  • format the data
  • yuanbin will continue this work with help of xingchao.
  • Character to word conversion
  • prepare the task: word similarity
  • prepare the dict.
  • Google word vector train
  • some ideal will discuss on weekly report.

Translation

  • v4.0 demo released
  • cut the dict and use new segment-tool

QA

  • lucene Optimization
  • rewrite the method to select the 50 standard question not same template.(this week)
  • test the boost keyword weight and extract the synonyms word.(this week)
  • check the word segment for template.(this week)
  • min-segment method improve the accuracy.(0.61->0.66)
  • check the query method for getting lucene information and to rewrite the score method like the idf value.
  • test
  • test the different idf vale from baidu sougou in fuzzymatch.(this week)
  • need to check the other 10% error.(this week)
  • spell check
  • simple demo done.
  • new inter will install SEMPRE