“Hulan-2013-10-18”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=ASR= ==ASR Kernel development== http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-10-11 ASR group weekly report ==TTS== * CD lab files done. Refining ...”创建新页面)
 
 
(相同用户的3个中间修订版本未显示)
第3行: 第3行:
 
==ASR Kernel development==
 
==ASR Kernel development==
  
[[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-10-11 ASR group weekly report]]
+
[[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-10-18 ASR group weekly report]]
  
 
==TTS==
 
==TTS==
  
* CD lab files done. Refining the script.  
+
* full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data.
* Training toolkit is cleaned up. Now no alignment is required. Parallel training is done.
+
* re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings.
* Tried syllable based system instead of phones.
+
 
* Collected an online-novel reading.
+
  
 
Next week:
 
Next week:
  
* Refine the script
+
* Check the signal parameters and solve the problem of pitch.
* Clean up the online reading.
+
* Prepare the large data training with both all-F 863 data.
 +
* Prepare the large data training with online novel.
  
 
=Dialog system=
 
=Dialog system=
第21行: 第21行:
 
* The search system migrated to the custom domain, with significant performance reduction
 
* The search system migrated to the custom domain, with significant performance reduction
  
 +
<pre>
 
   Customs:
 
   Customs:
 
n TF TFIDF
 
n TF TFIDF
第36行: 第37行:
 
4 0.867 0.95
 
4 0.867 0.95
 
5 0.95 0.967
 
5 0.95 0.967
 +
</pre>
  
 
* Two problems:  
 
* Two problems:  
第45行: 第47行:
 
# Analyse the data to expand the key words & phrases
 
# Analyse the data to expand the key words & phrases
 
# Analyse the data to attain better IDF.
 
# Analyse the data to attain better IDF.
 +
 +
=Summary system=
 +
 +
* Be familiar with the dragon system. Combing the system and extract the summary-only code.
 +
* Sentence based summary done. But request to migrate to Chinese.
 +
* Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices.
 +
 +
Next week:
 +
 +
* Test data set: 100 articles
 +
* TextRank Done
 +
 +
 +
==Template matching==
 +
 +
* Start to work on the self coding, while some requests have not been considered.
 +
* consider if to use the standard FSM toolkit by next Tuesday.

2013年10月18日 (五) 02:58的最后版本

ASR

ASR Kernel development

[ASR group weekly report]

TTS

  • full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data.
  • re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings.


Next week:

  • Check the signal parameters and solve the problem of pitch.
  • Prepare the large data training with both all-F 863 data.
  • Prepare the large data training with online novel.

Dialog system

  • The search system migrated to the custom domain, with significant performance reduction
  Customs:
n	TF	TFIDF	
1	0.496	0.485
2	0.619	0.615
3	0.676	0.673
4	0.713	0.715
5	0.740	0.738

Agriculture:
n	TF	TFIDF
1	0.75	0.8
2	0.85	0.883
3	0.867	0.917
4	0.867	0.95
5	0.95	0.967
  • Two problems:
  1. short of semantic cluster.
  2. limited training data for idf.
  • Next week
  1. Analyse the QA database, to extract useful domain dependent data
  2. Analyse the data to expand the key words & phrases
  3. Analyse the data to attain better IDF.

Summary system

  • Be familiar with the dragon system. Combing the system and extract the summary-only code.
  • Sentence based summary done. But request to migrate to Chinese.
  • Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices.

Next week:

  • Test data set: 100 articles
  • TextRank Done


Template matching

  • Start to work on the self coding, while some requests have not been considered.
  • consider if to use the standard FSM toolkit by next Tuesday.