“Hulan-2013-10-18”版本间的差异
来自cslt Wiki
(→ASR Kernel development) |
|||
(相同用户的一个中间修订版本未显示) | |||
第7行: | 第7行: | ||
==TTS== | ==TTS== | ||
− | * | + | * full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data. |
− | * | + | * re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings. |
− | + | ||
− | + | ||
Next week: | Next week: | ||
− | * | + | * Check the signal parameters and solve the problem of pitch. |
− | * | + | * Prepare the large data training with both all-F 863 data. |
+ | * Prepare the large data training with online novel. | ||
=Dialog system= | =Dialog system= | ||
第47行: | 第47行: | ||
# Analyse the data to expand the key words & phrases | # Analyse the data to expand the key words & phrases | ||
# Analyse the data to attain better IDF. | # Analyse the data to attain better IDF. | ||
+ | |||
+ | =Summary system= | ||
+ | |||
+ | * Be familiar with the dragon system. Combing the system and extract the summary-only code. | ||
+ | * Sentence based summary done. But request to migrate to Chinese. | ||
+ | * Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices. | ||
+ | |||
+ | Next week: | ||
+ | |||
+ | * Test data set: 100 articles | ||
+ | * TextRank Done | ||
+ | |||
+ | |||
+ | ==Template matching== | ||
+ | |||
+ | * Start to work on the self coding, while some requests have not been considered. | ||
+ | * consider if to use the standard FSM toolkit by next Tuesday. |
2013年10月18日 (五) 02:58的最后版本
ASR
ASR Kernel development
TTS
- full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data.
- re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings.
Next week:
- Check the signal parameters and solve the problem of pitch.
- Prepare the large data training with both all-F 863 data.
- Prepare the large data training with online novel.
Dialog system
- The search system migrated to the custom domain, with significant performance reduction
Customs: n TF TFIDF 1 0.496 0.485 2 0.619 0.615 3 0.676 0.673 4 0.713 0.715 5 0.740 0.738 Agriculture: n TF TFIDF 1 0.75 0.8 2 0.85 0.883 3 0.867 0.917 4 0.867 0.95 5 0.95 0.967
- Two problems:
- short of semantic cluster.
- limited training data for idf.
- Next week
- Analyse the QA database, to extract useful domain dependent data
- Analyse the data to expand the key words & phrases
- Analyse the data to attain better IDF.
Summary system
- Be familiar with the dragon system. Combing the system and extract the summary-only code.
- Sentence based summary done. But request to migrate to Chinese.
- Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices.
Next week:
- Test data set: 100 articles
- TextRank Done
Template matching
- Start to work on the self coding, while some requests have not been considered.
- consider if to use the standard FSM toolkit by next Tuesday.