2013年9月27日 (五) 01:36的最后版本

ASR

Lab format learned
All the details of the label format are clear
Construct label files from word/pingyin/phone transcription. Use csep word-segmentation tool to obtain these transcriptions from the original text.
Monophone, Triphone Chinese prototype system is ready. 500 sentences from 863 data are used for training. The trivial questions were used for clustering. 16k signals with 256 FFT transform. GV model used.
The voice is funny.

Next week:

Using 9k dim TF/IDF, compose feature vectors for each query, each answer. Mach the TF/IDF of query+answer to match the TF/IDF of new queries. Add the scores of the Cosine score of the match with queries and answers directly.

Fast match by listing the words only in the query & answer as the feature, and matching by order to speed up the score calculation.

Hierarchical matching. First split all the answers to 11 top-level categories, and then split them into 1030 second-level categories. query+answer TF/IDF score.

Next week:

@@ 第7行： / 第7行： @@
 ==TTS==
-* EST format learned.
+* Lab format learned
-* Check details of each option.
+* All the details of the label format are clear
+* Construct label files from word/pingyin/phone transcription. Use csep word-segmentation tool to obtain these transcriptions from the original text.
+* Monophone, Triphone Chinese prototype system is ready. 500 sentences from 863 data are used for training. The trivial questions were used for clustering. 16k signals with 256 FFT transform. GV model used.
+* The voice is funny.
+Next week:
+* Keep on collecting context-dependent labels
 =Dialog system=