2020年6月2日 (二) 15:04的版本

Data resources

In order to promote the development of minority speech processing technology, we will publish all the M2ASR dataset to scientific research institutions for free. You should ask for license before you can download the dataset.

Please send Email to shiying@cslt.org or lilt@cslt.org to get the license.

Uyghur

In the second phase, the Uyghur dataset is consist by:

136h speech audio and 353 speakers involved in it(166 males and 187 females)
transcription of the speech audio
lexicon in word level

Kazakh

In the second phase, the Kazakh dataset is consist by:

78h speech audio and 86 speakers involved in it(40 males and 46 females)
transcription of the speech audio
lexicon in word level

Tibetan

In the second phase, the Tibetan dataset is consist by:

72h speech audio and 147 speakers involved in it(66 males and 81 females)
transcription of the speech audio
lexicon in word level

@@ 第8行： / 第8行： @@
 ==Uyghur==
-In the second phase,
+In the second phase, the Uyghur dataset is consist by:
-The content of Uyghur dataset is show below
+* 136h speech audio and 353 speakers involved in it(166 males and 187 females)
-* Speech data: 136h speech audio 353 speakers involved in it(166 male and 187 female)
+* transcription of the speech audio
-*
+* lexicon in word level
 ==Kazakh==
+In the second phase, the Kazakh dataset is consist by:
+* 78h speech audio and 86 speakers involved in it(40 males and 46 females)
+* transcription of the speech audio
+* lexicon in word level
 ==Tibetan==
+In the second phase, the Tibetan dataset is consist by:
+* 72h speech audio and 147 speakers involved in it(66 males and 81 females)
+* transcription of the speech audio
+* lexicon in word level

“ASR-nsfc-data”版本间的差异

2020年6月2日 (二) 15:04的版本

目录

Data resources

Uyghur

Kazakh

Tibetan

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具