“ASR:2015-04-27”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Speech Processing == === AM development === ==== Environment ==== * grid-11 often shut down automatically, too slow computation speed. * New grid-13 added, using...”为内容创建页面)
 
Zxw讨论 | 贡献
Speech Processing
 
第3行: 第3行:
  
 
==== Environment ====
 
==== Environment ====
* grid-11 often shut down automatically, too slow computation speed.
+
* To update the wiki enviroment infomation --done
* New grid-13 added, using gpu970
+
* To update the wiki enviroment infomation
+
  
 
==== RNN AM====
 
==== RNN AM====
 +
* hold  -- Chao Liu
 
* details at http://liuc.cslt.org/pages/rnnam.html
 
* details at http://liuc.cslt.org/pages/rnnam.html
 
* Test monophone on RNN using dark-knowledge
 
* Test monophone on RNN using dark-knowledge
第16行: 第15行:
 
* investigate alpha parameter in time domian and frquency domain  
 
* investigate alpha parameter in time domian and frquency domain  
 
* ALPHA>=0, using data generated by reverber toolkit
 
* ALPHA>=0, using data generated by reverber toolkit
* consider theta
+
* consider theta  
 +
* make spectrom feature with Kaldi
  
 
====RNN-DAE(Deep based Auto-Encode-RNN)====
 
====RNN-DAE(Deep based Auto-Encode-RNN)====
第27行: 第27行:
  
 
===Ivector&Dvector based ASR===
 
===Ivector&Dvector based ASR===
:* Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
+
:* Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric -- Tian Lan
:* Direct using the dark-knowledge strategy to do the ivector training.
+
:* Direct using the dark-knowledge strategy to do the ivector training -- Tian Lan
 
:* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=340
 
:* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=340
:* Ivector dimention is smaller, performance is better
 
:* Augument to hidden layer is better than input layer
 
:* train on wsj(testbase dev93+evl92)
 
  
 
===Dark knowledge===
 
===Dark knowledge===
:* Ensemble
+
:* Ensemble --Zhiyong Zhang
 
::*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang
 
::*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang
 
:* adaptation for chinglish under investigation  --Mengyuan Zhao
 
:* adaptation for chinglish under investigation  --Mengyuan Zhao
::* Try to improve the chinglish performance extremly
+
:* chinglish adaptation task best performane is obtained ofrom retraining , dark knowledge helps adapt model,try to tune  papameters layear by layer ,change cv --Mengyuan Zhao
 
:* unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
 
:* unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
 
::* test large database with AMIDA
 
::* test large database with AMIDA

2015年4月29日 (三) 08:00的最后版本

Speech Processing

AM development

Environment

  • To update the wiki enviroment infomation --done

RNN AM

Mic-Array

  • Change the prediction from fbank to spectrum features
  • investigate alpha parameter in time domian and frquency domain
  • ALPHA>=0, using data generated by reverber toolkit
  • consider theta
  • make spectrom feature with Kaldi

RNN-DAE(Deep based Auto-Encode-RNN)

Speaker ID

Ivector&Dvector based ASR

Dark knowledge

  • Ensemble --Zhiyong Zhang
  • adaptation for chinglish under investigation --Mengyuan Zhao
  • chinglish adaptation task best performane is obtained ofrom retraining , dark knowledge helps adapt model,try to tune papameters layear by layer ,change cv --Mengyuan Zhao
  • unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
  • test large database with AMIDA

bilingual recognition