Hulan-2013-09-27
来自cslt Wiki
ASR
ASR Kernel development
TTS
- EST format learned.
- Check details of each option.
Dialog system
- MH, RY help compose 60 questions, which are being used for testing.
- Conducting the initial experiment:
- Using 9k dim TF/IDF, compose feature vectors for each query, each answer. Mach the TF/IDF of query+answer to match the TF/IDF of new queries. Add the scores of the Cosine score of the match with queries and answers directly.
- CER: 8.3%
- Query time: very slow
- Remove 506 stop words. No significant change on CER & Query time.
- Fast match by listing the words only in the query & answer as the feature, and matching by order to speed up the score calculation.
- Hierarchical matching. First split all the answers to 11 top-level categories, and then split them into 1030 second-level categories. query+answer TF/IDF score.
- CER: top-category: 6.7%, second-level category: 18.3%
- Query time: 6/sec
- Keep two top-level categories, try to reduce top-level errors:
- CER: two top-category, no errors. second-level category: still on going
Next week:
- Reverse index-based fast match, on going.