2014-08-01
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
- NN LM
AM development
Sparse DNN
- WJS sparse DNN shows a slightly better than non-sparse cases when the network is in a large scale
- Pre-training does work for DNN training (for both 4/5/6 layers)
Noise training
- Journal paper writing on going
Multilingual ASR
- Native English speaker + Chinglish speaker obtained better performance.
Drop out & convolutional network
- Change learning to 0.001, the training process can be started.
- Frame Accuracy goes to : (with/without drop probability normalization)
Denoising & Farfield ASR
- By tuning parameters of late-response lag & response time, obtained performance improvement with Lasso.
Simulation results: Baseline:
model/test | far_evl92 | near_evl92
clean_ce | 59.38 | 19.25 mpe_clean_ce | 40.46 | 12.94
Lasso with optimal parameters(lambda=0.05, delta=5, N=10)
model/test | far_evl92 | near_evl92
clean_ce | 54.63 | 15.75
mpe_clean_ce | 36.58 | 11.64
Real data results:
model/test | far_evl92 | near_evl92 -------------------------------------------------------------------------- clean_ce | 94.86 | 63.48
mpe_clean_ce | 92.29 | 58.37
dereverberated recording :
model/test | far_evl92 | near_evl92 -------------------------------------------------------------------------- clean_ce | 94.91 | 61.03
mpe_clean_ce | 91.28 | 54.16
- Adaptation under running
VAD
- Waiting for testing results
Scoring
- Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.
Confidence
- Be familiar with Kaldi
- Need to extract lattice and DNN features
Embedded decoder
- Chatting LM released (80k)
- Train two smaller network: 500x4+600, 400x4+500: on going
- Need to upload the new client code onto git (+)
- Build a new graph with MPE3 am and chatting LM.
LM development
Domain specific LM
h2. Domain specific LM construction
h3. TAG LM
- TAG obtained better performance
h3. Chatting LM
- First version released (80k lexicon)
- Prepare 2nd released (120k lexicon)
- Test on Xiaotang long
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Speaker ID
- Full-data SRE trial goes into the final stage
- results will be ready soon
Translation
- collecting more data (Xinhua parallel text, bible, name entity) for the second version
- check possible parameters to control phrase pair lexicon