“Asr-nsfc-weekly-2016-11-28”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(4位用户的5个中间修订版本未显示)
第1行: 第1行:
 
 
 
{| class="wikitable"
 
{| class="wikitable"
 
!Date!!People !! Last Week !! This Week
 
!Date!!People !! Last Week !! This Week
第8行: 第6行:
 
|清华
 
|清华
 
||  
 
||  
* 确定了训练集匹配问题、语料domain不符等问题
+
* 发现CodeMap有一些问题,对语料进行转换时会造成一些错误。
 +
* 尝试降低test集语料对语言模型的ppl
 
||  
 
||  
* 等待数据和语料的进一步更新,重新训练声学模型和语言模型
+
* 继续语言模型的相关工作。
 
|-
 
|-
 
|-
 
|-
 
|新大
 
|新大
 
||  
 
||  
*  
+
* Recording works on Kazak utterances are finished.
 
||  
 
||  
*  
+
* Checking and preparing Kazak AM text corpora.
 +
*  Revise Kazak LM corpora and some supporting programs.
 
|-
 
|-
 
|-
 
|-
 
|民大
 
|民大
 
||  
 
||  
*   
+
挑选6000句左右的正式发音文本
 +
*  校对拉萨话发音词典1000条左右
 
||  
 
||  
*   
+
* 藏语拉萨话发音词典的校对
 +
*  蒙语词典录入  
 
|-
 
|-
 
|}
 
|}
  
 
----
 
----
 
 
 
 
{| class="wikitable"
 
{| class="wikitable"
 
!Date!!People !! Last Week !! This Week
 
!Date!!People !! Last Week !! This Week
 
|-
 
|-
| rowspan="5"|2016.11.21
+
| rowspan="5"|2016.11.28
 
|-
 
|-
 
|清华
 
|清华
 
||  
 
||  
哈语声学模型(TDNN)训练完毕
+
确定了训练集匹配问题、语料domain不符等问题
*  哈语语料训练语言模型,并用于解码[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]:语料与训练集的domain符合度有待检查
+
 
||  
 
||  
获取合适的语料训练语言模型,完成baseline
+
等待数据和语料的进一步更新,重新训练声学模型和语言模型
 
|-
 
|-
 
|-
 
|-
 
|新大
 
|新大
 
||  
 
||  
* Kazak language vocabulary extraction and acoustic dictionary building tool are finished, can be shared along with Uyghur tools
+
* additional Kazak acoustic rules are discussed with experts, and new rules are determined.
* There are still never ending problems with Kazak acoustic rules and spelling, we spent a lot of time on correcting.
+
* lost Kazak utterances are extracted and make-up preparation works are done.  
* We built tools of character and acoustic layers of corpora compilation for Uyghur and Kazak. We hope to have a common structure for every language.
+
 
||  
 
||  
* We plan to conclude a relatively complete syllable structure and use it for spell checking and correcting various corpora.
+
* Kazak irregular acoustic dictionary going to be prepared manually again.
* We appointed students to make a double directional text-numeric transformation tool.
+
* recording of Kazak utterances will go on.
* A third layer of morphological analyzer tool is designed for multilingual purpose first for Uyghur and Kazak language, mainly for sub-word analysis.  
+
 
|-
 
|-
 
|-
 
|-
 
|民大
 
|民大
 
||  
 
||  
确定了口语发音文本
+
选择藏蒙书面语发音文本各50000条左右
校对拉萨话发音词典300条
+
校对拉萨话发音词典1000条左右
 
||  
 
||  
选择书面语发音文本
+
对藏语书面发音文本进行triphone计算,挑选6000句左右的正式发音文本
 
*  藏语拉萨话发音词典的校对
 
*  藏语拉萨话发音词典的校对
 
*  蒙语词典录入   
 
*  蒙语词典录入   
 
|-
 
|-
 
|}
 
|}

2016年12月5日 (一) 10:32的最后版本

Date People Last Week This Week
2016.11.28
清华
  • 发现CodeMap有一些问题,对语料进行转换时会造成一些错误。
  • 尝试降低test集语料对语言模型的ppl
  • 继续语言模型的相关工作。
新大
  • Recording works on Kazak utterances are finished.
  • Checking and preparing Kazak AM text corpora.
  • Revise Kazak LM corpora and some supporting programs.
民大
  • 挑选6000句左右的正式发音文本
  • 校对拉萨话发音词典1000条左右
  • 藏语拉萨话发音词典的校对
  • 蒙语词典录入

Date People Last Week This Week
2016.11.28
清华
  • 确定了训练集匹配问题、语料domain不符等问题
  • 等待数据和语料的进一步更新,重新训练声学模型和语言模型
新大
  • additional Kazak acoustic rules are discussed with experts, and new rules are determined.
  • lost Kazak utterances are extracted and make-up preparation works are done.
  • Kazak irregular acoustic dictionary going to be prepared manually again.
  • recording of Kazak utterances will go on.
民大
  • 选择藏蒙书面语发音文本各50000条左右
  • 校对拉萨话发音词典1000条左右
  • 对藏语书面发音文本进行triphone计算,挑选6000句左右的正式发音文本
  • 藏语拉萨话发音词典的校对
  • 蒙语词典录入