“ASR-nsfc-data”版本间的差异
来自cslt Wiki
第19行: | 第19行: | ||
* transcription of the speech audio | * transcription of the speech audio | ||
* lexicon in word level | * lexicon in word level | ||
+ | |||
==Tibetan== | ==Tibetan== | ||
第25行: | 第26行: | ||
* transcription of the speech audio | * transcription of the speech audio | ||
* lexicon in word level | * lexicon in word level | ||
+ | |||
+ | |||
+ | == Mongolian == | ||
+ | Coming soon | ||
+ | |||
+ | == Kirgiz == | ||
+ | Coming soon |
2020年6月2日 (二) 15:06的版本
Data resources
In order to promote the development of minority speech processing technology, we will publish all the M2ASR dataset to scientific research institutions for free. You should ask for license before you can download the dataset.
Please send Email to shiying@cslt.org or lilt@cslt.org to get the license.
Uyghur
In the second phase, the Uyghur dataset is consist by:
- 136h speech audio and 353 speakers involved in it(166 males and 187 females)
- transcription of the speech audio
- lexicon in word level
Kazakh
In the second phase, the Kazakh dataset is consist by:
- 78h speech audio and 86 speakers involved in it(40 males and 46 females)
- transcription of the speech audio
- lexicon in word level
Tibetan
In the second phase, the Tibetan dataset is consist by:
- 72h speech audio and 147 speakers involved in it(66 males and 81 females)
- transcription of the speech audio
- lexicon in word level
Mongolian
Coming soon
Kirgiz
Coming soon