2014-09-29
来自cslt Wiki
Speech Processing
AM development
Sparse DNN
- Performance improvement found when pruned slightly
- Experiments show that
- Suggest to use TIMIT / AURORA 4 for training
Noise training
- First draft of the noisy training journal paper
- Check abnormal behavior with large sigma (Yinshi, Liuchao)
Drop out & Rectification & convolutive network
- Drop out
- No performance improvement found yet.
- [1]
- Rectification
- Dropout NA problem was caused by large magnitude of weights
- Convolutive network
- Test more configurations
- Zhiyong will work on CNN
- Recurrent neural network
- investigate CURRENNT for AM
Denoising & Farfield ASR
- Lasso-based de-reverberation is done with the REVERBERATION toolkit
- Start to compose the experiment section for the SL paper.
VAD
- problems found at the beginning part of speech (0-0.02s?)
- Noise model training done. Under testing.
- Need to investigate the performance reduction in babble noise. Call Jia.
Speech rate training
- Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
- Seems ROS model is superior to the normal one with faster speech
- Need to check distribution of ROS on WSJ
- Suggest to extract speech data of different ROS, construct a new test set
- Suggest to use Tencent training data
- Suggest to remove silence when compute ROS
low resource language AM training
- Results on CVSS[3]
- Use Chinese NN as initial NN, change the last layer
Scoring
- global scoring done.
- Pitch & rhythm done, need testing
- Harmonics hold
Confidence
- experiments done, need more data
- Basic confidence by using lattice-based posterior + DNN posterior + ROS done
- 23% detection error achieved by balanced model
Speaker ID
- Add VAD to system
- GMM-based test program delivered
- GMM registration program done
Emotion detection
- Zhang Weiwei is learning the code
- Sinovoice is implementing the server
Text Processing
LM development
Domain specific LM
h2. ngram generation is on going h2. look the memory and baidu_hi done
h2. NUM tag LM:
- maxi work is released.
- yuanbin continue the tag lm work.
- add the ner to tag lm .
- Boost specific words like wifi if TAG model does not work for a particular word.
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
- Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
- SSA-based local linear mapping still on running.
- k-means classes change to 2.
- Knowledge vector started
- document obtained from wiki
- formula obtained
- Character to word conversion
- read more paper .
- prepare to train .
- Google word vector train
- improve the sampling method
RNN LM
- Prepare WSJ database
- Trained model 10000 x 4 + 320 + 10000
- Better performance obtained (4.16-3.47)
- gigaword sampling for Chinese data
Translation
- v3.0 demo released
- still slow
- cut the vocabulary that is not important .
QA
- liangshan_v1 performance 74.3%.
- New framework and GA method is done
- add SEMPRE tool to framework