cslt Wiki - 用户贡献 [zh-cn]

Asr-nsfc-weekly-2018-01-09

2018-01-09T05:47:20Z

Mijiti：

{| class="wikitable"
!Date!!People !! last week !! this week
|-
| rowspan="5"|2018.01.09
|-
|清华
||
* 基于i-vector的语种分类，训练过程结束，测试正在进行
* m2asr 年度总结报告
||
* 完成基于i-vector的语种分类
* 完成DNN-i-vector训练
|-
|-
|新大
||
* 年底总结工作。
* 准备哈萨克语的补录工作
||
* 语料库完善工作
* 辅助工具完善工作
|-
|-
|民大
||
*期末考试，暂无进展
||
*扩充藏语词典
*扩充蒙语词典
|-
|}
-------------------------------------------------------------------------------

Asr-nsfc-weekly-2018-01-09

2018-01-09T05:36:54Z

Mijiti：

{| class="wikitable"
!Date!!People !! last week !! this week
|-
| rowspan="5"|2018.01.09
|-
|清华
||
* 基于i-vector的语种分类，训练过程结束，测试正在进行
* m2asr 年度总结报告
||
* 完成基于i-vector的语种分类
* 完成DNN-i-vector训练
|-
|-
|新大
||
* 年底总结工作。
* 准备哈萨克语的补录工作
||
*
|-
|-
|民大
||
*期末考试，暂无进展
||
*扩充藏语词典
*扩充蒙语词典
|-
|}
-------------------------------------------------------------------------------

Asr-nsfc-weekly-2017-09-25

2017-09-25T05:46:54Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.09.25
|-
|清华
||
* apply fake Lid into Gsoftmax model(in progress)
||
* finish fake Lid decoding
|-
|-
|新大
||
* acoustic normalization tool, and parallel phonemes for 3 languages is prepared, and used for spell checking and acoustic dictionary building.
||
* correct Kazak speech corpora problems.
* improve Kirghiz text corpus quality, especially spelling mistakes .
* work on multilingual morpheme segmenter tool for three languages.
|-
|-
|民大
||
*
||
*
|-
|}
---------------------------------------------------------------------------------------------
{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.09.18
|-
|清华
||
*group-based softmax 模型训练完毕效果得到了相应的提升[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl here]
||
*Lid + group based softmax
|-
|-
|新大
||
*
||
*
|-
|-
|民大
||
*手机录音程序实现基本功能，需进一步完善
*蒙语词典校对（正在进行）
||
*完善手机录音程序
*拉萨藏语字典部分国标音标的校准
*蒙语词典校对
|-
|}

Asr-nsfc-weekly-2017-09-25

2017-09-25T05:46:06Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.09.18
|-
|清华
||
* apply fake Lid into Gsoftmax model(in progress)
||
* finish fake Lid decoding
|-
|-
|新大
||
* acoustic normalization tool, and parallel phonemes for 3 languages is prepared, and used for spell checking and acoustic dictionary building.
||
* correct Kazak speech corpora problems.
* improve Kirghiz text corpus quality, especially spelling mistakes .
* work on multilingual morpheme segmenter tool for three languages.
|-
|-
|民大
||
*
||
*
|-
|}
---------------------------------------------------------------------------------------------
{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.09.18
|-
|清华
||
*group-based softmax 模型训练完毕效果得到了相应的提升[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl here]
||
*Lid + group based softmax
|-
|-
|新大
||
*
||
*
|-
|-
|民大
||
*手机录音程序实现基本功能，需进一步完善
*蒙语词典校对（正在进行）
||
*完善手机录音程序
*拉萨藏语字典部分国标音标的校准
*蒙语词典校对
|-
|}

Asr-nsfc-weekly-2017-09-25

2017-09-25T05:45:31Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.09.18
|-
|清华
||
* apply fake Lid into Gsoftmax model(in progress)
||
* finish fake Lid decoding
|-
|-
|新大
||
* acoustic normalization tool, and parallel phonemes for 3 languages is prepared, and used for spell checking and acoustic dictionary building.
||
* correct Kazak speech corpora problems.
* improve Kirghiz text quality, especially spelling mistakes .
* work on multilingual morpheme segmenter tool for three languages.
|-
|-
|民大
||
*
||
*
|-
|}
---------------------------------------------------------------------------------------------
{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.09.18
|-
|清华
||
*group-based softmax 模型训练完毕效果得到了相应的提升[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl here]
||
*Lid + group based softmax
|-
|-
|新大
||
*
||
*
|-
|-
|民大
||
*手机录音程序实现基本功能，需进一步完善
*蒙语词典校对（正在进行）
||
*完善手机录音程序
*拉萨藏语字典部分国标音标的校准
*蒙语词典校对
|-
|}

AP17:OLR-special session

2017-05-02T11:20:40Z

Mijiti：/* Potential Papers */

==Title==

Minor- and Multilingual speech and language processing

==Organizers==

'''Dong Wang: Tsinghua University (wangdong99@mails.tsinghua.edu.cn)'''

Dr. Dong Wang got his PhD degree at the University of Edinburgh, and worked in Oracle, IBM, and Nuance. He is now an assistant professor at the certer for speech and language technologies (CSLT) at Tsinghua University. Dr. Wang’s research interest covers speech processing, language processing and financial processing. He has published more than 80 academic papers in the related area, including three best paper awards. Dr. Wang plays active roles in the speech research community: he serves as the secretary in national conference of machine-man speech communication (NCMMSC) and a country representative of the mainland China in Oriental COCOSDA. He was the local chair of ChinaSIP 2013, special session co-chair of ISCSLP 14 and plenary talk co-chair of ISCSLP 16. Dr. Wang is now serving as the vice Chair of the SLA track of APSIPA.

'''Guanyu Li: Northwest National University (guanyu-li@163.com)'''

Dr. Guanyu Li got his PhD degree at the Northwest University for Nationalities, Gansu Province, China. He worked in several ERP software development companies as a developmental engineer, and is now an associate professor at the Northwest University for Nationalities and the Key Laboratory of National Language Intelligent Processing，Gansu　Province. His research interest includes speech processing for minor languages in China, especially speech recognition and speech synthesis. In recent years, he published more than ten papers in related areas.

'''Mijit Ablimit: Xinjiang University (mijit@xju.edu.cn)
Dr. Mijit Ablimit got his PhD degree at Kyoto University of Japan. He is now an assiciate professor at the Information Technology and Engineering college of Xinjiang University. His research interest covers speech, language, and multilinuage information processing for less popular languages of China.

==Target track==

Speech and Language processing

==Introduction==

Minor- and multilingual phenomenon is a important for modern international societies.
This special session focuses on minor- and multilingual speech and language processing,
including but not limited to the following topics:

* Minor- and Multilingual phonetic and phonological analysis
* Minor- and Multilingual speech recognition
* Minor- and Multilingual speaker recognition
* Minor- and Multilingual speech synthesis
* Minor- and Multilingual language understanding
* Resource construction for minor- and multilingual langauges

==Potential Papers==

===Title: Prior-constrained multilingual speech recognition ===
*Author: Ying Shi, Zhiyuan Tang, Dong Wang

*Abstract: Conventional multilingual speech recognition follows ether a tandem approach (language identification)
or parallel architecture (parallel decoding). This paper presented a novel prior-constrained approach that
conduct the decoding in a multilingual linguistic space, where a prior of the language is used to constrain
the decoding frame by frame. Our experiments found that this approach can realize true simultaneous multilingual
speech recognition.

===Title: Memory-based Uyghur-Chinese Translation===
*Author: Shiyue Zhang, Guli, Mijit Ablimit, Askar Hamdulla

*Abstract: Neural machine translation (NMT) has achieved significant performance. However, this NMT approach
has not yet effectively applied to minor languages such as Uyghur to Chinese translation. The main problem here
is that the limited training data does not support an end-to-end neural learning. In this paper, we propose to
use a memory structure to assist the NMT inference under the condition of limited resource languages. Our experiments
demonstrated that the this approach is highly efficient compared to the vanilla NMT, and outperforms the conventional
statistical machine translation (SMT) approach.

===Title: Resource construction for Mongolia ===
*Author: Shipeng Xu, Guanyu Li, Hongzhi Yu

*Abstract: Mongolia is a typical low-resource language. The resource limitation is in various aspects, from acoustic
analysis, phonetic rules, lexicon, speech and text data. This paper describes our recent progression on Mongolia
resource construction supported by the NSFC project.

===Title: Tibetan speech database construction===
*Author: Guanyu Li, Hongzhi Yu

*Abstract: Tibetan is an important low-resource language in China. The syllable structure of Tibetan is similar
as Chinese, but the composition rules in orthographic forms is highly complex. Additionally, the lexicon
resource is far from standard and rich. This paper describes our recent progression on Tibetan
resource construction supported by the NSFC M2ASR project.

===Title: A large Kazak speech database and a speech recognition baseline===
*Author: Askar Hamdulla, Ying Shi

*Abstract: We describe the construction process of a large scale Kazak speech database. The database involves
150 hours of speech signals, recorded by more than 200 speakers. A speech recognition baseline

===Title: Multilingual resource construction for Uyghur, Kazak, Kirghiz languages ===
*Author: Mijit Ablimit, Askar Hamdulla, Ying Shi, Dong Wang
*Abstract: Minority languages, especially spoken languages, are strongly influenced by major languages or mixing each other. So a platform of uniform phonetic and morphological processing methods can provide a methodology and extra resource for the less popular languages. This paper describes multi-language phonetic and morphological tools and corpus compilation processing for some resource scares languages.

AP17:OLR-special session

2017-05-02T10:43:25Z

Mijiti：/* Organizers */

Asr-nsfc-weekly-2017-02-13

2017-02-14T05:22:23Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.02.13
|-
|清华
||
* 哈萨克语语音识别系统基本完成[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]。
||
* 任务暂停，准备interspeech论文。
|-
|-
|新大
||
* Multilingual character and acoustics processing tools are ready to use;
* A new alpha server coputer is installed in tchinghua lab. and ready to use for experiments.
||
* Prepare for Kirghiz text for recordings.
|-
|-
|民大
||
* 暂无进展
||
*
|-
|}

Asr-nsfc-weekly-2017-01-16

2017-01-23T11:53:01Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.01.23
|-
|清华
||
*
||
*
|-
|-
|新大
||
* multilingual chars and acoustics processing toolkit has been upgraded with a manual and test recipes.
||
* work on lexicon segmentation tool.
|-
|-
|民大
||
*
||
*
|-
|}

----------------------------------------------------------------

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.01.16
|-
|清华
||
* 完成了词缀的分割
||
* 完成哈萨克语识别的剩余工作
|-
|-
|新大
||
* a python toolkit for multilingual chars and acoustics has been esteblished as a version 0.1
||
* keep revising the charToolkit, and make a multilingual lexicon processing toolkit structure
|-
|-
|民大
||
* 完成汉藏口语句子翻译500句左右。
||
* 蒙古语发音词典建议，汉藏口语句子翻译。
|-
|}

----------------------------------------------------------------

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.01.03
|-
|清华
||
*哈萨克语TRP完成，生成新的test集新test集在语音模型上的ppl为192.5较为合理
||
*等新大录好新的test集以后测试
|-
|-
|新大
||
* syllable based spelling corrections are finished for 4k and 200k corpora.
* Kazak acoustic dictionary program is finished.
||
* keep work on Kazak ASR corpora problems.
|-
|-
|民大
||
藏语拉萨话口语翻译300句。
校对藏语文本语料90000句，蒙语文本语料60000句。
||
藏语拉萨话口语翻译。
文本正则化处理。
|-
|}

Asr-nsfc-weekly-2017-01-16

2017-01-16T08:24:42Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.01.16
|-
|清华
||
* 完成了词缀的分割
||
* 完成哈萨克语识别的剩余工作
|-
|-
|新大
||
* a python toolkit for multilingual chars and acoustics has been esteblished as a version 0.1
||
* keep revising the charToolkit, and make a multilingual lexicon processing toolkit structure
|-
|-
|民大
||
* 完成汉藏口语句子翻译500句左右。
||
* 蒙古语发音词典建议，汉藏口语句子翻译。
|-
|}

----------------------------------------------------------------

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2017.01.03
|-
|清华
||
*哈萨克语TRP完成，生成新的test集新test集在语音模型上的ppl为192.5较为合理
||
*等新大录好新的test集以后测试
|-
|-
|新大
||
* syllable based spelling corrections are finished for 4k and 200k corpora.
* Kazak acoustic dictionary program is finished.
||
* keep work on Kazak ASR corpora problems.
|-
|-
|民大
||
藏语拉萨话口语翻译300句。
校对藏语文本语料90000句，蒙语文本语料60000句。
||
藏语拉萨话口语翻译。
文本正则化处理。
|-
|}

Asr-nsfc-weekly-2016-12-19

2016-12-19T13:55:20Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.12.19
|-
|清华
||
* 通过爬虫爬取了 http://www.elarna.com/tote.php 和 http://kazakh.people.com.cn/ 中的一些数据，完成了爬取语料的处理工作
||
* 针对重新补录的数据集重新训练AM模型，使用网络爬取的语料和处理过的翻译局提供的语料生成LM。
|-
|-
|新大
||
* We collected a Chinese-Russian-Kazak parallel corpus of dialog, which, we think, is free from spelling mistakes.
||
* Keep working on Kazak acousic dictionary tool, which has not been finished.
|-
|-
|民大
||
* 完成藏语拉萨话发音词典的校对
* 藏语口语发音文本翻译完成500句
||
* 藏语和蒙语书面发音文本的选择
* 口语发音文本翻译
|-
|}
---------------------------------------------------------------
{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.12.12
|-
|清华
||
* 主要进行语言模型相关工作，发现当前语料有过多内容不适合做语言模型，寻找了一批新的语料，并对新的语料进行了处理工作。
||
* 完成当前语料的处理工作并统计总数，如果总数不足，再通过网页爬取的方式，对语料进行补充。
|-
|-
|新大
||
* 1504 Kazak utterance recording work has finished.
* 4000 Kazak AM text corpora revised again.
||
* Revising Kazak acoustic dictionary tool.
* keep checking Kazak LM corpora.
|-
|-
|民大
||
* 校对拉萨话发音词典1000条左右
* 确定6000句藏语书面语发音文本
||
* 藏语拉萨话发音词典的校对
* 口语发音文本翻译
|-
|}

Asr-nsfc-weekly-2016-12-12

2016-12-12T14:57:29Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.12.12
|-
|清华
||
* 主要进行语言模型相关工作，发现当前语料有过多内容不适合做语言模型，寻找了一批新的语料，并对新的语料进行了处理工作。
||
* 完成当前语料的处理工作并统计总数，如果总数不足，再通过网页爬取的方式，对语料进行补充。
|-
|-
|新大
||
* 1504 Kazak utterance recording work has finished.
* 4000 Kazak AM text corpora revised again.
||
* Revising Kazak acoustic dictionary tool.
* keep checking Kazak LM corpora.
|-
|-
|民大
||
* 校对拉萨话发音词典1000条左右
* 确定6000句藏语书面语发音文本
||
* 藏语拉萨话发音词典的校对
* 口语发音文本翻译
|-
|}
---------------------------------------------------------------
{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.12.05
|-
|清华
||
* 发现CodeMap有一些问题，对语料进行转换时会造成一些错误。
* 尝试降低test集语料对语言模型的ppl
||
* 继续语言模型的相关工作。
|-
|-
|新大
||
* Recording works on Kazak utterances are finished.
||
* Checking and preparing Kazak AM text corpora.
* Revise Kazak LM corpora and some supporting programs.
|-
|-
|民大
||
* 挑选6000句左右的正式发音文本
* 校对拉萨话发音词典1000条左右
||
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-28

2016-12-05T10:32:30Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.28
|-
|清华
||
* 发现CodeMap有一些问题，对语料进行转换时会造成一些错误。
* 尝试降低test集语料对语言模型的ppl
||
* 继续语言模型的相关工作。
|-
|-
|新大
||
* Recording works on Kazak utterances are finished.
||
* Checking and preparing Kazak AM text corpora.
* Revise Kazak LM corpora and some supporting programs.
|-
|-
|民大
||
* 挑选6000句左右的正式发音文本
* 校对拉萨话发音词典1000条左右
||
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

----
{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.28
|-
|清华
||
* 确定了训练集匹配问题、语料domain不符等问题
||
* 等待数据和语料的进一步更新，重新训练声学模型和语言模型
|-
|-
|新大
||
* additional Kazak acoustic rules are discussed with experts, and new rules are determined.
* lost Kazak utterances are extracted and make-up preparation works are done.
||
* Kazak irregular acoustic dictionary going to be prepared manually again.
* recording of Kazak utterances will go on.
|-
|-
|民大
||
* 选择藏蒙书面语发音文本各50000条左右
* 校对拉萨话发音词典1000条左右
||
* 对藏语书面发音文本进行triphone计算，挑选6000句左右的正式发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-28

2016-11-28T14:26:25Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.28
|-
|清华
||
* 确定了训练集匹配问题、语料domain不符等问题
||
* 等待数据和语料的进一步更新，重新训练声学模型和语言模型
|-
|-
|新大
||
* additional Kazak acoustic rules are discussed with experts, and new rules are determined.
* lost Kazak utterances are extracted and make-up preparation works are done.
||
* Kazak irregular acoustic dictionary going to be prepared manually again.
* recording of Kazak utterances will go on.
|-
|-
|民大
||
*
||
*
|-
|}

----

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.21
|-
|清华
||
* 哈语声学模型(TDNN)训练完毕
* 哈语语料训练语言模型，并用于解码[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]:语料与训练集的domain符合度有待检查
||
* 获取合适的语料训练语言模型，完成baseline
|-
|-
|新大
||
* Kazak language vocabulary extraction and acoustic dictionary building tool are finished, can be shared along with Uyghur tools
* There are still never ending problems with Kazak acoustic rules and spelling, we spent a lot of time on correcting.
* We built tools of character and acoustic layers of corpora compilation for Uyghur and Kazak. We hope to have a common structure for every language.
||
* We plan to conclude a relatively complete syllable structure and use it for spell checking and correcting various corpora.
* We appointed students to make a double directional text-numeric transformation tool.
* A third layer of morphological analyzer tool is designed for multilingual purpose first for Uyghur and Kazak language, mainly for sub-word analysis.
|-
|-
|民大
||
* 确定了口语发音文本
* 校对拉萨话发音词典300条
||
* 选择书面语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-21

2016-11-21T14:10:11Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.21
|-
|清华
||
* 哈语声学模型(TDNN)训练完毕
* 哈语语料训练语言模型，并用于解码[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]:语料与训练集的domain符合度有待检查
||
* 获取合适的语料训练语言模型，完成baseline
|-
|-
|新大
||
* Kazak language vocabulary extraction and acoustic dictionary building tool are finished, can be shared along with Uyghur tools
* There are still never ending problems with Kazak acoustic rules and spelling, we spent a lot of time on correcting.
* We built tools of character and acoustic layers of corpora compilation for Uyghur and Kazak. We hope to have a common structure for every language.
||
* We plan to conclude a relatively complete syllable structure and use it for spell checking and correcting various corpora.
* We appointed students to make a double directional text-numeric transformation tool.
* A third layer of morphological analyzer tool is designed for multilingual purpose first for Uyghur and Kazak language, mainly for sub-word analysis.
|-
|-
|民大
||
* 确定了口语发音文本
* 校对拉萨话发音词典300条
||
* 选择书面语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

-----

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.14
|-
|清华
||
* 哈语模型初始模型训练完成
* 需新大检查哈语语音数据
* 需新大上传语言模型语料
||
* 完成训练哈语声学模型
|-
|-
|新大
||
* 完成哈萨克语的文字代码转换，代码归一化等。
* 完成了6480个哈萨克词条的人工发音辞典。
* 同时修改了哈萨克“语音-文本”语料中的一些拼写错误。
||
* 要完成哈萨克语言的发音辞典，包括人工和自动部分。
* 对哈萨克语音语料种的“语音-文本”不对齐，缺失等进行检查，并修改。
* 检查哈萨克语言模型语料中的问题，并为建立LM模型做准备。
|-
|民大
||
* 口语发音文本的选择
||
* 确定口语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-21

2016-11-21T14:07:54Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.21
|-
|清华
||
* 哈语声学模型(TDNN)训练完毕
* 哈语语料训练语言模型，并用于解码[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]:语料与训练集的domain符合度有待检查
||
* 获取合适的语料训练语言模型，完成baseline
|-
|-
|新大
||
* Kazak language vocabulary extraction and acoustic dictionary building tool are finished, can be shared along with Uyghur tools
* There are still never ending problems with Kazak acoustic rules and spelling, we spent a lot of time on correcting.
* We built character and acoustic layers of corpora compilation for Uyghur and Kazak. We hope to have a common structure for every language.
||
* We plan to conclude a relatively complete syllable structure and use it for spell checking and correcting various corpora.
* We appointed students to make a double directional text-numeric transformation tool.
* A third layer of morphological analyzer tool is designed for multilingual purpose first for Uyghur and Kazak language, mainly for sub-word analysis.
|-
|-
|民大
||
* 确定了口语发音文本
* 校对拉萨话发音词典300条
||
* 选择书面语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

-----

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.14
|-
|清华
||
* 哈语模型初始模型训练完成
* 需新大检查哈语语音数据
* 需新大上传语言模型语料
||
* 完成训练哈语声学模型
|-
|-
|新大
||
* 完成哈萨克语的文字代码转换，代码归一化等。
* 完成了6480个哈萨克词条的人工发音辞典。
* 同时修改了哈萨克“语音-文本”语料中的一些拼写错误。
||
* 要完成哈萨克语言的发音辞典，包括人工和自动部分。
* 对哈萨克语音语料种的“语音-文本”不对齐，缺失等进行检查，并修改。
* 检查哈萨克语言模型语料中的问题，并为建立LM模型做准备。
|-
|民大
||
* 口语发音文本的选择
||
* 确定口语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-21

2016-11-21T14:06:31Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.21
|-
|清华
||
* 哈语声学模型(TDNN)训练完毕
* 哈语语料训练语言模型，并用于解码[http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=tangzy&step=view_request&cvssid=576]:语料与训练集的domain符合度有待检查
||
* 获取合适的语料训练语言模型，完成baseline
|-
|-
|新大
||
* Kazak language vocabulary extraction and acoustic dictionary building tool are finished, can be shared along with Uyghur tools
* There are still never ending problems with Kazak acoustic rules and spelling, we spent a lot of time on correcting.
* We built character and acoustic layers of corpora compilation for Uyghur and Kazak. We hope to have a common structure for every language.
||
* We plan to conclude a relatively complete syllable structure and use it for spelling checking and correcting various corpora.
* We appointed students to make a double directional text-numeric transformation tool.
* A third layer of morphological analyzer tool is designed for multilingual purpose first for Uyghur and Kazak language, mainly for sub-word analysis.
|-
|-
|民大
||
* 确定了口语发音文本
* 校对拉萨话发音词典300条
||
* 选择书面语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

-----

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.14
|-
|清华
||
* 哈语模型初始模型训练完成
* 需新大检查哈语语音数据
* 需新大上传语言模型语料
||
* 完成训练哈语声学模型
|-
|-
|新大
||
* 完成哈萨克语的文字代码转换，代码归一化等。
* 完成了6480个哈萨克词条的人工发音辞典。
* 同时修改了哈萨克“语音-文本”语料中的一些拼写错误。
||
* 要完成哈萨克语言的发音辞典，包括人工和自动部分。
* 对哈萨克语音语料种的“语音-文本”不对齐，缺失等进行检查，并修改。
* 检查哈萨克语言模型语料中的问题，并为建立LM模型做准备。
|-
|民大
||
* 口语发音文本的选择
||
* 确定口语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-14

2016-11-14T11:18:23Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.14
|-
|清华
||
* 哈语模型初始模型训练完成
* 需新大检查哈语语音数据
* 需新大上传语言模型语料
||
* 完成训练哈语声学模型
|-
|-
|新大
||
* 完成哈萨克语的文字代码转换，代码归一化等。
* 完成了6480个哈萨克词条的人工发音辞典。
* 同时修改了哈萨克“语音-文本”语料中的一些拼写错误。
||
* 要完成哈萨克语言的发音辞典，包括人工和自动部分。
* 对哈萨克语音语料种的“语音-文本”不对齐，缺失等进行检查，并修改。
* 检查哈萨克语言模型语料中的问题，并为建立LM模型做准备。
|-
|民大
||
* 口语发音文本的选择
||
* 确定口语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}

Asr-nsfc-weekly-2016-11-14

2016-11-14T11:16:50Z

Mijiti：

{| class="wikitable"
!Date!!People !! Last Week !! This Week
|-
| rowspan="5"|2016.11.14
|-
|清华
||
* 哈语模型初始模型训练完成
* 需新大检查哈语语音数据
* 需新大上传语言模型语料
||
* 完成训练哈语声学模型
|-
|-
|新大
||
* 完成哈萨克语的文字代码转换，代码归一化等。
* 完成了6480个哈萨克词条的人工发音辞典。
* 同时修改了哈萨克“语音-文本”语料中的一些拼写错误。
||
* 要完成哈萨克语言的发音辞典，包括人工和自动部分。
* 对哈萨克语音语料种的“语音-文本”不对齐，缺失等进行检查，并修改。
* 准备哈萨克语言模型语料。
|-
|民大
||
* 口语发音文本的选择
||
* 确定口语发音文本
* 藏语拉萨话发音词典的校对
* 蒙语词典录入
|-
|}