2013年10月18日 (五) 02:58的最后版本

ASR

ASR Kernel development

[ASR group weekly report]

TTS

full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data.
re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings.

Next week:

Check the signal parameters and solve the problem of pitch.
Prepare the large data training with both all-F 863 data.
Prepare the large data training with online novel.

Dialog system

The search system migrated to the custom domain, with significant performance reduction

  Customs:
n	TF	TFIDF	
1	0.496	0.485
2	0.619	0.615
3	0.676	0.673
4	0.713	0.715
5	0.740	0.738

Agriculture:
n	TF	TFIDF
1	0.75	0.8
2	0.85	0.883
3	0.867	0.917
4	0.867	0.95
5	0.95	0.967

Two problems:

short of semantic cluster.
limited training data for idf.

Next week

Analyse the QA database, to extract useful domain dependent data
Analyse the data to expand the key words & phrases
Analyse the data to attain better IDF.

Summary system

Be familiar with the dragon system. Combing the system and extract the summary-only code.
Sentence based summary done. But request to migrate to Chinese.
Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices.

Next week:

Test data set: 100 articles
TextRank Done

Template matching

Start to work on the self coding, while some requests have not been considered.
consider if to use the standard FSM toolkit by next Tuesday.

@@ 第3行： / 第3行： @@
 ==ASR Kernel development==
-[[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-10-11  ASR group weekly report]]
+[[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-10-18  ASR group weekly report]]
 ==TTS==
-* CD lab files done. Refining the script.
+* full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data.
-* Training toolkit is cleaned up. Now no alignment is required. Parallel training is done.
+* re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings.
-* Tried syllable based system instead of phones.
-* Collected an online-novel reading.
 Next week:
-* Refine the script
+* Check the signal parameters and solve the problem of pitch.
-* Clean up the online reading.
+* Prepare the large data training with both all-F 863 data.
+* Prepare the large data training with online novel.
 =Dialog system=
@@ 第21行： / 第21行： @@
 * The search system migrated to the custom domain, with significant performance reduction
+<pre>
    Customs:
 n	TF	TFIDF
@@ 第36行： / 第37行： @@
 	0.867	0.95
 	0.95	0.967
+</pre>
 * Two problems:
@@ 第45行： / 第47行： @@
 # Analyse the data to expand the key words & phrases
 # Analyse the data to attain better IDF.
+=Summary system=
+* Be familiar with the dragon system. Combing the system and extract the summary-only code.
+* Sentence based summary done. But request to migrate to Chinese.
+* Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices.
+Next week:
+* Test data set: 100 articles
+* TextRank Done
+==Template matching==
+* Start to work on the self coding, while some requests have not been considered.
+* consider if to use the standard FSM toolkit by next Tuesday.

“Hulan-2013-10-18”版本间的差异

2013年10月18日 (五) 02:58的最后版本

目录

ASR

ASR Kernel development

TTS

Dialog system

Summary system

Template matching

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具