“2014-09-29”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(某位用户的一个中间修订版本未显示)
第3行: 第3行:
  
 
==== Sparse DNN ====
 
==== Sparse DNN ====
* Investigating layer-based DNN training
+
* Performance improvement found when pruned slightly
 +
* Experiments show that
 +
* Suggest to use TIMIT / AURORA 4 for training
  
 
====Noise training====
 
====Noise training====
 +
*
 
:* First draft of the noisy training journal paper  
 
:* First draft of the noisy training journal paper  
 
:* Check abnormal behavior with large sigma (Yinshi, Liuchao)
 
:* Check abnormal behavior with large sigma (Yinshi, Liuchao)
第21行: 第24行:
 
* Convolutive network
 
* Convolutive network
 
# Test more configurations  
 
# Test more configurations  
 +
* Zhiyong will work on CNN
 +
 +
* Recurrent neural network
 +
:* investigate CURRENNT for AM
  
  
第26行: 第33行:
 
====Denoising & Farfield ASR====
 
====Denoising & Farfield ASR====
  
 +
*
 
* Lasso-based de-reverberation is done with the REVERBERATION toolkit
 
* Lasso-based de-reverberation is done with the REVERBERATION toolkit
 
:* Start to compose the experiment section for the SL paper.
 
:* Start to compose the experiment section for the SL paper.
第31行: 第39行:
 
====VAD====
 
====VAD====
  
 +
* problems found at the beginning part of speech (0-0.02s?)
 
* Noise model training done. Under testing.  
 
* Noise model training done. Under testing.  
 
* Need to investigate the performance reduction in babble noise. Call Jia.
 
* Need to investigate the performance reduction in babble noise. Call Jia.
第36行: 第45行:
  
 
====Speech rate training====
 
====Speech rate training====
 
+
*
 
* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
 
* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
 
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]
 
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]
第45行: 第54行:
 
* Suggest to use Tencent training data
 
* Suggest to use Tencent training data
 
* Suggest to remove silence when compute ROS
 
* Suggest to remove silence when compute ROS
 +
 +
==== low resource language AM training ====
 +
* Results on CVSS[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=274]
 +
* Use Chinese NN as initial NN, change the last layer
  
 
====Scoring====
 
====Scoring====
  
* Pitch & rythmn done.
+
* global scoring done.
 +
* Pitch & rhythm done, need testing
 
* Harmonics hold
 
* Harmonics hold
  
第54行: 第68行:
 
====Confidence====
 
====Confidence====
  
 +
* experiments done, need more data
 +
*
 
* Basic confidence by using lattice-based posterior + DNN posterior + ROS done
 
* Basic confidence by using lattice-based posterior + DNN posterior + ROS done
 
* 23% detection error achieved by balanced model
 
* 23% detection error achieved by balanced model
第59行: 第75行:
 
===Speaker ID===
 
===Speaker ID===
  
 +
* Add VAD to system
 
* GMM-based test program delivered
 
* GMM-based test program delivered
* Implementing GMM registration program
+
* GMM registration program done
  
 
===Emotion detection===
 
===Emotion detection===
 +
* Zhang Weiwei is learning the code
 
* Sinovoice is implementing the server
 
* Sinovoice is implementing the server
  
第72行: 第90行:
 
====Domain specific LM====
 
====Domain specific LM====
  
h2. domain specific count dumped
 
 
h2. ngram generation is on going
 
h2. ngram generation is on going
 
+
h2. look the memory and baidu_hi done
  
 
h2. NUM tag LM:
 
h2. NUM tag LM:
 
+
* maxi work is released.
* HCLG union seems better than G union, when integrating grammar + LM (25->23)
+
* yuanbin continue the tag lm work.
 +
* add the ner to tag lm .
 
* Boost specific words like wifi if TAG model does not work for a particular word.
 
* Boost specific words like wifi if TAG model does not work for a particular word.
  
第88行: 第106行:
 
* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
 
* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
 
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
 
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
:* probably over-fitting with the MLP training
+
:* SSA-based local linear mapping still on running.
:* SSA-based local linear mapping still on running
+
:* k-means classes change to 2.
  
 
* Knowledge vector started
 
* Knowledge vector started
 
:* document obtained from wiki
 
:* document obtained from wiki
 +
:* formula obtained
  
 
* Character to word conversion
 
* Character to word conversion
:* Design the transform model
+
:* read more paper .
 +
:* prepare to train .
  
 +
* Google word vector train
 +
:* improve the sampling method
  
 
===RNN LM===
 
===RNN LM===
第109行: 第131行:
 
* v3.0 demo released
 
* v3.0 demo released
 
:* still slow
 
:* still slow
 +
:* cut the vocabulary that is not important .
  
 
===QA===
 
===QA===
  
* Huilan framework design done
+
* liangshan_v1 performance 74.3%.
* Investigate better framework
+
* New framework and GA method is done
 +
* add SEMPRE tool to framework

2014年9月29日 (一) 05:49的最后版本

Speech Processing

AM development

Sparse DNN

  • Performance improvement found when pruned slightly
  • Experiments show that
  • Suggest to use TIMIT / AURORA 4 for training

Noise training

  • First draft of the noisy training journal paper
  • Check abnormal behavior with large sigma (Yinshi, Liuchao)

Drop out & Rectification & convolutive network

  • Drop out
  • No performance improvement found yet.
  • [1]
  • Rectification
  • Dropout NA problem was caused by large magnitude of weights
  • Convolutive network
  1. Test more configurations
  • Zhiyong will work on CNN
  • Recurrent neural network
  • investigate CURRENNT for AM


Denoising & Farfield ASR

  • Lasso-based de-reverberation is done with the REVERBERATION toolkit
  • Start to compose the experiment section for the SL paper.

VAD

  • problems found at the beginning part of speech (0-0.02s?)
  • Noise model training done. Under testing.
  • Need to investigate the performance reduction in babble noise. Call Jia.


Speech rate training

  • Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db

[2]

  • Seems ROS model is superior to the normal one with faster speech
  • Need to check distribution of ROS on WSJ
  • Suggest to extract speech data of different ROS, construct a new test set
  • Suggest to use Tencent training data
  • Suggest to remove silence when compute ROS

low resource language AM training

  • Results on CVSS[3]
  • Use Chinese NN as initial NN, change the last layer

Scoring

  • global scoring done.
  • Pitch & rhythm done, need testing
  • Harmonics hold


Confidence

  • experiments done, need more data
  • Basic confidence by using lattice-based posterior + DNN posterior + ROS done
  • 23% detection error achieved by balanced model

Speaker ID

  • Add VAD to system
  • GMM-based test program delivered
  • GMM registration program done

Emotion detection

  • Zhang Weiwei is learning the code
  • Sinovoice is implementing the server


Text Processing

LM development

Domain specific LM

h2. ngram generation is on going h2. look the memory and baidu_hi done

h2. NUM tag LM:

  • maxi work is released.
  • yuanbin continue the tag lm work.
  • add the ner to tag lm .
  • Boost specific words like wifi if TAG model does not work for a particular word.


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • SSA-based local linear mapping still on running.
  • k-means classes change to 2.
  • Knowledge vector started
  • document obtained from wiki
  • formula obtained
  • Character to word conversion
  • read more paper .
  • prepare to train .
  • Google word vector train
  • improve the sampling method

RNN LM

  • Prepare WSJ database
  • Trained model 10000 x 4 + 320 + 10000
  • Better performance obtained (4.16-3.47)
  • gigaword sampling for Chinese data

Translation

  • v3.0 demo released
  • still slow
  • cut the vocabulary that is not important .

QA

  • liangshan_v1 performance 74.3%.
  • New framework and GA method is done
  • add SEMPRE tool to framework