“2014-11-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Lr讨论 | 贡献
Text Processing
第155行: 第155行:
  
 
====Domain specific LM====
 
====Domain specific LM====
* domain lm(need to discuss with xiaoxi)
+
* domain lm
:* embedded language model('''this week''')
+
:* embedded language model done
:* train some more LMs with Zhenlong (dianzishu sogou bbs chosen)("need result").
+
:* train some more LMs with Zhenlong (dianzishu sogou bbs chosen),put result on cvss.
:* keep on training sogou2T lm(14/16 on 3rd iteration).('''this week''')
+
:* small count done and ready to merge it.
  
 
*  new dict.
 
*  new dict.
:* handover of this work to hanzhenglong, give a simple docuemnt('''this week''')
+
:* dongxu help hanzhenglong to set up the new dict_2.0.
 +
:* hanzhenglong need to report the result everyday on cvss
  
 
====tag LM====
 
====tag LM====
 
+
* summary done
* different weight [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=lr&step=view_request&cvssid=304 2014-Nov-23,Monday]
+
:*
+
{| border="2px"
+
|+ different weight
+
|-
+
! method
+
! tag-jsgf  !! corpus !! weight !! wer !! ser !! add_wer
+
|-
+
! experiment 3
+
| 500(490 less frequent and 10 unseen)||500|| 0.1 || 16.72 || 77.92 || -
+
|-
+
!
+
| || || 0.3 || 15.42 || 71.25 || -
+
|-
+
!
+
| || || 0.5 || 15.40 || 69.58 || -
+
|-
+
!
+
| || || 0.7 || 15.28 || 68.75|| -
+
|-
+
!
+
| || || 0.8 || 15.38 || 68.33|| -
+
|-
+
!
+
| || || 1 || 15.98 || 69.17|| -
+
|-
+
!
+
| || || 2 || 19.08|| 70.83|| -
+
|-
+
! experiment 4
+
|100(90 less frequent and 10 unseen) ||100 || 0.008 || 15.28|| 69.58|| -
+
|-
+
!
+
| || || 0.02 || 14.84|| 69.58|| -
+
|-
+
!
+
| || || 0.05 || 15.11|| 69.58|| -
+
|-
+
!
+
| || || 0.1 || 15.30|| 69.75|| -
+
|-
+
!
+
| || || 0.3 || 16.01|| 70.42|| -
+
|-
+
! experiment 5
+
|500 ||100 || 0.01 || 17.57|| 78.75|| -
+
|-
+
!
+
| || || 0.05 || 16.84|| 77.08|| -
+
|-
+
!
+
| || || 0.08 || 16.59|| 76.25|| -
+
|-
+
!
+
| || || 0.15 || 16.76|| 75.42|| -
+
|-
+
! experiment 6
+
| 1280|| 500|| 0.1 || 17.42|| 77.92|| -
+
|-
+
!
+
| || || 0.5 || 15.20|| 69.17|| -
+
|-
+
!
+
| || || 0.8 || 15.30|| 68.33|| -
+
|-
+
!
+
| || || 1 || 15.69|| 69.58|| -
+
|-
+
|}
+
:* conclusion:
+
  1. compare experiment 3  with experiment 5:
+
    same jsgf file, but the  tag number in corpus if different, we can find that when add
+
  more tag to corpus, the optimal weight is larger.
+
  2. compare experiment 3 with experiment 6:
+
  same tag number in corpus, but different jsgf size, we can find that different jsgf size have the
+
  same optimal weight.
+
 
* need to do
 
* need to do
:* tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong ('''this week''')
+
:* tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (hold)
:* make a summary about tag-lm and '''journal paper'''(wxx and yuanb)('''two weeks''').
+
:* make a summary about tag-lm and '''journal paper'''(wxx and yuanb)('''this weeks''').
  
 
====RNN LM====
 
====RNN LM====
 
*rnn
 
*rnn
 
:* test wer RNNLM on Chinese data from jietong-data('''this week''')
 
:* test wer RNNLM on Chinese data from jietong-data('''this week''')
:* check the rnnlm code about how to Initialize and update learning rate.
+
:* generate the ngram model from rnnlm and test the ppl with different size txt.[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Jt-chinese#sampling_data_from_rnnlm]
:* generate the ngram model from rnnlm and test the ppl with different size txt.('''this week''')
+
 
*lstm+rnn
 
*lstm+rnn
:* check the lstm-rnnlm code about how to Initialize and update learning rate.
+
:* check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)
  
 
===Word2Vector===
 
===Word2Vector===
第261行: 第185行:
 
====Knowledge vector====
 
====Knowledge vector====
 
* Knowledge vector started
 
* Knowledge vector started
:* begin to code
+
:* generate the structured data from wiki
 
====Character to wordr====
 
====Character to wordr====
 
* Character to word conversion(hold)
 
* Character to word conversion(hold)
第273行: 第197行:
  
 
===QA===
 
===QA===
deatil:[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Hulan-2014-11-06]
+
deatil:
 
====Spell mistake====
 
====Spell mistake====
 
* retrain the ngram model('''caoli''')
 
* retrain the ngram model('''caoli''')
第281行: 第205行:
 
:* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''')
 
:* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''')
 
====Multi-Scene Recognition====
 
====Multi-Scene Recognition====
* handover to duxk('''this week''')
+
* done
 
====XiaoI framework====
 
====XiaoI framework====
 
* give a report about xiaoI framework
 
* give a report about xiaoI framework
 
* new inter will install SEMPRE
 
* new inter will install SEMPRE
 
====patent====
 
====patent====
* GA-method improve the QA('''this week''')
+
* done

2014年12月1日 (一) 07:01的版本

Speech Processing

AM development

Environment

  • Already buy 3 760GPU
  • grid-9 760GPU crashed again;
  • Change 760gpu card of grid-12 and grid-14

Sparse DNN

RNN AM

  • Initial nnet seems not very well, need to be pre-trained or test lower learn-rate.
  • For AURORA 4 1h/epoch, model train done.
  • Using AURORA 4 short-sentence with a smaller number of targets.(+)
  • Adjusting the learning rate.(+)
  • Trying toolkit of Microsoft.(+)
  • details at http://liuc.cslt.org/pages/rnn.html

A new nnet training scheduler

Drop out & Rectification & convolutive network

  • Drop out
  • AURORA4 dataset
  • Use different proportion of noise data to investigate the effect of xEnt and mpe and dropout
    • Problem 1) The effect of dropout in different noise proportion;
          2) The effect of MPE in different noise proportion;
          3) The effect of MPE+dropout in different noise proportion.
    • Find and test unknown noise test-data.(++)
    • Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
    • Debug the low cv frame-accuracy
  • MaxOut
  • 6min/epoch
1) AURORA4 -15h
   NOTE: gs==groupsize
  • pretraining based maxout
    • Select units in Groupsize interval, but need low learn-rate
    • Force accept the first iteration. Jump out from the local-minimum
  • P-norm
   ---------------------------------------------------------------------------------------------------------
        model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
       nnet_std-baseline       |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs6_p2     |  6.17           |  27.51           |  24.98          |  15.40 
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs10_p2    |  6.40           |  28.18           |  26.60          |  15.82 
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs10_p3    |  6.45           |  28.73           |  30.01          |  20.24 
   ---------------------------------------------------------------------------------------------------------
       lr0.04-4e-3_gs6_p2      |  6.47           |  27.42           |  27.48          |  17.35 
   ---------------------------------------------------------------------------------------------------------
  • Convolutive network (+)
  • AURORA 4
:** 1)
-----------------------------------------------------------------------------------------------------------------------
                 |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate	| pooling | TBA
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_baseline| 6.70 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1000_3  | 6.61 |     4      | 1000	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1400_3  | 6.61 |     4      | 1400	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_4  | 6.91 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   4     |patch-dim1 6 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_2  | -    |     4      | 1200	|      0      |    4   |   198   |   0.008	|   2     |patch-dim1 8 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_3  | 6.66 |     5      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
:** 2)
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                       | %WER          | Dnnhiddenlayers       | hid-dim       | pooling       | CNN_unit      |cnn_init_opts
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cnn_nonlda_std			| 5.73		| 4			| 1200    	| 3 		|  		|"--patch-dim1 8" input_dim ~ patch-dim1
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cnn_nonlda_cnnunit_384		| 5.85		| 4			| 1200    	| 3		| 384		|"--patch-dim1 8 --num-filters2 384"	   
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cnn_nonlda_cnnunit_220		| ----------    | 4			| 1200    	| 3		| 220		|"--patch-dim1 8 --num-filters2 220"	   
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

MSE

(1) AURORA4 (train_clean)
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline_xent     |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          std-baseline_mse      |  6.05           |  31.30           |  30.03          |  15.77
   ---------------------------------------------------------------------------------------------------------

DAE(Deep Atuo-Encode)

 (1) train_clean
   drop-retention/testcase(WER)| test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
  ---------------------------------------------------------------------------------------------------------
      std-xEnt-sigmoid-baseline| 6.04            |    29.91         |   27.76         | 16.37
  ---------------------------------------------------------------------------------------------------------
      std+dae_cmvn_noFT_2-1200 | 7.10            |    15.33         |   16.58         | 9.23
  ---------------------------------------------------------------------------------------------------------
   std+dae_cmvn_splice5_2-100  | 8.19            |    15.21         |   15.25         | 9.31
  ---------------------------------------------------------------------------------------------------------

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Frame energy feature extraction, done
  • Harmonics and Teager energy features being investigation (+)
  • Previous results to be organized for a paper
  • MPE model VAD test

Speech rate training

  • Data ready on tencent set; some errors on speech rate dependent model
  • Retrain new model(+)

Scoring

  • Timber Comparison done.
  • harmonics based timber comparison: frequency based feature is better
  • GMM based timber comparison is done. Similar to speaker recognition
  • TODO: Code checkin and technique report

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 4% (GMM-based system)--Text independent
  • EER ~ 6%(1s) / 0.5%(5s) (GMM-based system)--Text dependent
  • test different number of components; fast i-vector computing

Language ID

  • GMM-based language is ready.
  • Delivered to Jietong
  • Prepare the test-case

Voice Conversion

  • Yiye is reading materials


Text Processing

LM development

Domain specific LM

  • domain lm
  • embedded language model done
  • train some more LMs with Zhenlong (dianzishu sogou bbs chosen),put result on cvss.
  • small count done and ready to merge it.
  • new dict.
  • dongxu help hanzhenglong to set up the new dict_2.0.
  • hanzhenglong need to report the result everyday on cvss

tag LM

  • summary done
  • need to do
  • tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (hold)
  • make a summary about tag-lm and journal paper(wxx and yuanb)(this weeks).

RNN LM

  • rnn
  • test wer RNNLM on Chinese data from jietong-data(this week)
  • generate the ngram model from rnnlm and test the ppl with different size txt.[1]
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.(hold)
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

Knowledge vector

  • Knowledge vector started
  • generate the structured data from wiki

Character to wordr

  • Character to word conversion(hold)
  • prepare the task: word similarity
  • prepare the dict.

Translation

  • v5.0 demo released
  • cut the dict and use new segment-tool

QA

deatil:

Spell mistake

  • retrain the ngram model(caoli)

improve fuzzy match

  • add Synonyms similarity using MERT-4 method(hold)

improve lucene search

  • using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.(liurong this month)

Multi-Scene Recognition

  • done

XiaoI framework

  • give a report about xiaoI framework
  • new inter will install SEMPRE

patent

  • done