“Asr-nsfc-weekly-2016-11-28”版本间的差异

2016年12月5日 (一) 10:32的最后版本

Date	People	Last Week	This Week
2016.11.28
	清华	发现CodeMap有一些问题，对语料进行转换时会造成一些错误。尝试降低test集语料对语言模型的ppl	继续语言模型的相关工作。
	新大	Recording works on Kazak utterances are finished.	Checking and preparing Kazak AM text corpora. Revise Kazak LM corpora and some supporting programs.
	民大	挑选6000句左右的正式发音文本校对拉萨话发音词典1000条左右	藏语拉萨话发音词典的校对蒙语词典录入

Date	People	Last Week	This Week
2016.11.28
	清华	确定了训练集匹配问题、语料domain不符等问题	等待数据和语料的进一步更新，重新训练声学模型和语言模型
	新大	additional Kazak acoustic rules are discussed with experts, and new rules are determined. lost Kazak utterances are extracted and make-up preparation works are done.	Kazak irregular acoustic dictionary going to be prepared manually again. recording of Kazak utterances will go on.
	民大	选择藏蒙书面语发音文本各50000条左右校对拉萨话发音词典1000条左右	对藏语书面发音文本进行triphone计算，挑选6000句左右的正式发音文本藏语拉萨话发音词典的校对蒙语词典录入

@@ 第1行： / 第1行： @@
 {| class="wikitable"
 !Date!!People !! Last Week !! This Week
@@ 第8行： / 第6行： @@
 |清华
 ||
-*  确定了训练集匹配问题、语料domain不符等问题
+* 发现CodeMap有一些问题，对语料进行转换时会造成一些错误。
+* 尝试降低test集语料对语言模型的ppl
 ||
-*  等待数据和语料的进一步更新，重新训练声学模型和语言模型
+* 继续语言模型的相关工作。
 |-
 |-
 |新大
 ||
-*
+*  Recording works on Kazak utterances are finished.
 ||
-*
+*  Checking and preparing Kazak AM text corpora.
+*  Revise Kazak LM corpora and some supporting programs.
 |-
 |-
 |民大
 ||
-*
+*  挑选6000句左右的正式发音文本
+*  校对拉萨话发音词典1000条左右
 ||
-*
+*  藏语拉萨话发音词典的校对
+*  蒙语词典录入
 |-
 |}
 ----
 {| class="wikitable"
 !Date!!People !! Last Week !! This Week
 |-
-| rowspan="5"|2016.11.21
+| rowspan="5"|2016.11.28
 |-
 |清华
 ||
-*  哈语声学模型(TDNN)训练完毕
+*  确定了训练集匹配问题、语料domain不符等问题
-*  哈语语料训练语言模型，并用于解码[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]:语料与训练集的domain符合度有待检查
 ||
-*  获取合适的语料训练语言模型，完成baseline
+*  等待数据和语料的进一步更新，重新训练声学模型和语言模型
 |-
 |-
 |新大
 ||
-* Kazak language vocabulary extraction and acoustic dictionary building tool are finished, can be shared along with Uyghur tools
+* additional Kazak acoustic rules are discussed with experts, and new rules are determined.
-* There are still never ending problems with Kazak acoustic rules and spelling, we spent a lot of time on correcting.
+* lost Kazak utterances are extracted and make-up preparation works are done.
-* We built tools of character and acoustic layers of corpora compilation for Uyghur and Kazak. We hope to have a common structure for every language.
 ||
-* We plan to conclude a relatively complete syllable structure and use it for spell checking and correcting various corpora.
+* Kazak irregular acoustic dictionary going to be prepared manually again.
-* We appointed students to make a double directional text-numeric transformation tool.
+* recording of Kazak utterances will go on.
-* A third layer of morphological analyzer tool is designed for multilingual purpose first for Uyghur and Kazak language, mainly for sub-word analysis.
 |-
 |-
 |民大
 ||
-*  确定了口语发音文本
+*  选择藏蒙书面语发音文本各50000条左右
-*  校对拉萨话发音词典300条
+*  校对拉萨话发音词典1000条左右
 ||
-*  选择书面语发音文本
+*  对藏语书面发音文本进行triphone计算，挑选6000句左右的正式发音文本
 *  藏语拉萨话发音词典的校对
 *  蒙语词典录入
 |-
 |}

“Asr-nsfc-weekly-2016-11-28”版本间的差异

2016年12月5日 (一) 10:32的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具