Cslt：以内容“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM developmen...”创建新页面

2014-08-29T02:14:01Z

以内容“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM developmen...”创建新页面

新页面

==Resoruce Building==

== Leftover questions==

* Investigating LOUDS FST.
* CLG embedded decoder plus online compiler.
* DNN-GMM co-training
* NN LM

== AM development ==

=== Sparse DNN ===
* WJS sparse DNN does not obtain further improvement

===Noise training===

:* Error found in data setting. Re-run the test with gamma=20,30
:* Re-run test with gamma=1,0.1
:* Noisy training journal paper almost done.

==Drop out & Rectification & convolutive network==

* Change learning to 0.001, the training process can be started:
*# change the drop probability from 0.5 to 0.2. Frame accuracy is improved. WER seems problematic.
*# Experiment learning rate 1 and 8, NA

* Rectification
# Rectification itself failed with large weights.
# Including L1 penalty enables the training but got very bad performance.
# Try to set the maximum value with rectifier

* Convolutive network
# Test more configurations

===Denoising & Farfield ASR===

* Lasso-based dereverberation obtained reasonable results
:# Found some specious problems with frequency-dependent Lasso.
:# Proposed full frequency Lasso & full frequency-temporal Lasso.
:# good performance was obtained with F-dependent Lasso
:* Near data: 10.79 -> 10.35 (lamdba=0.05)
:* Far data : 40.53 -> 35.65 (lambda=0.15)

===VAD===

* Some discrepancy between CSLT results & Puqiang results
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=207]

:* check if the label is really problematic
:* check if short-time spike noise is the major problem (can be solved by spike filtering)
:* check if low-energy babble noise caused mismatch (can be solved by global energy detection)
:* test noise data trained model

===Speech rate training===

* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]

* Seems ROS model is superior to the normal one with faster speech
* Need to check distribution of ROS on WSJ
* Suggest to extract speech data of different ROS, construct a new test set
* Suggest to use Tencent training data
* Suggest to remove silence when compute ROS

===Scoring===

* hold

===Confidence===

* Implement a tool for data labeling
* Finished extraction of two features: DNN posterior + lattice posterior

==LM development==

===Domain specific LM===

h2. G determinization problem re-open.

h2. NUM tag LM:

27h JS test: 20.16 vs 20.19
2h JS test: 17.48 vs 17.49

h2. Analyze the property of the tag LM: (1) random NUM should obtain better performance; (2) other words are not seriously impacted.

==Word2Vector==

===W2V based doc classification===

* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
* Interest group setup, reading scheduled every Thusday
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

==RNN LM==

* New toolkit from Thomas obtained.
* Prepare WSJ database, re-test RNN.

==Speaker ID==

* Second model done

==Emotion detection==

* initial performance obtained
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=271]

==Translation==
* Failed due to out of memory
* Re-start the training due to some errors in grid

2014-08-29 - 版本历史

Cslt：以内容“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM developmen...”创建新页面