“ASR-nsfc-data”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第13行: 第13行:
 
* transcription of the speech audio.
 
* transcription of the speech audio.
 
* lexicon in the word level.
 
* lexicon in the word level.
* [[https://share.weiyun.com/gVBrkPON|Download link]]
+
* [[Download link https://share.weiyun.com/gVBrkPON]]
  
 
==Kazakh==
 
==Kazakh==

2020年6月3日 (三) 07:50的版本

Data resources

In order to promote the development of minority speech signal processing technology, we will publish all the M2ASR dataset to scientific research institutions for free.

You should ask for license before you can download the datasets.

Please send Email to shiying@cslt.org or lilt@cslt.org to get the license.

Uyghur

In the second phase, the Uyghur dataset consists of:

Kazakh

In the second phase, the Kazakh dataset consists of:

  • 78h speech audio and 86 speakers (40 males and 46 females).
  • transcription of the speech audio.
  • lexicon in the word level.
  • Download link

Tibetan

In the second phase, the Tibetan dataset consists of:

  • 72h speech audio and 147 speakers (66 males and 81 females).
  • transcription of the speech audio.
  • lexicon in the word level.
  • Download link

Mongolian

Coming soon...

Kirgiz

Coming soon...