“ASR-nsfc-data”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(3位用户的16个中间修订版本未显示)
第3行: 第3行:
 
In order to promote the development of minority speech signal processing technology, we will
 
In order to promote the development of minority speech signal processing technology, we will
 
publish all the M2ASR dataset to scientific research institutions for free.
 
publish all the M2ASR dataset to scientific research institutions for free.
You should ask for license before you can download the dataset.
+
 
 +
You should ask for license before you can download the datasets.
  
 
Please send Email to shiying@cslt.org or lilt@cslt.org to get the license.
 
Please send Email to shiying@cslt.org or lilt@cslt.org to get the license.
第11行: 第12行:
 
* 136h speech audio and 353 speakers (166 males and 187 females).
 
* 136h speech audio and 353 speakers (166 males and 187 females).
 
* transcription of the speech audio.
 
* transcription of the speech audio.
* lexicon in word level.
+
* lexicon in the word level.
 
+
* [https://share.weiyun.com/gVBrkPON Download link]
[[Download link]]
+
  
 
==Kazakh==
 
==Kazakh==
第19行: 第19行:
 
* 78h speech audio and 86 speakers (40 males and 46 females).
 
* 78h speech audio and 86 speakers (40 males and 46 females).
 
* transcription of the speech audio.
 
* transcription of the speech audio.
* lexicon in word level.
+
* lexicon in the word level.
 
+
* [https://share.weiyun.com/iYLWq8Jl Download link]
[[Download link]]
+
  
 
==Tibetan==
 
==Tibetan==
第27行: 第26行:
 
* 72h speech audio and 147 speakers (66 males and 81 females).
 
* 72h speech audio and 147 speakers (66 males and 81 females).
 
* transcription of the speech audio.  
 
* transcription of the speech audio.  
* lexicon in word level.
+
* lexicon in the word level.
 
+
* [https://share.weiyun.com/gaEJaFA5 Download link]
[[Download link]]
+
  
 
== Mongolian ==
 
== Mongolian ==
 
Coming soon...
 
Coming soon...
  
== Kirgiz ==
+
== Kirghiz ==
Coming soon...
+
In the second phase, the Kirghiz dataset consists of:
 +
* 128h speech audio and 163 speakers (100 males and 63 females).
 +
* transcription of the speech audio.
 +
* lexicon in the word level.
 +
* [https://pan.baidu.com/s/18jxLKo4YRWH5K3GWPWmGXQ  Download link]

2023年1月11日 (三) 14:03的最后版本

Data resources

In order to promote the development of minority speech signal processing technology, we will publish all the M2ASR dataset to scientific research institutions for free.

You should ask for license before you can download the datasets.

Please send Email to shiying@cslt.org or lilt@cslt.org to get the license.

Uyghur

In the second phase, the Uyghur dataset consists of:

  • 136h speech audio and 353 speakers (166 males and 187 females).
  • transcription of the speech audio.
  • lexicon in the word level.
  • Download link

Kazakh

In the second phase, the Kazakh dataset consists of:

  • 78h speech audio and 86 speakers (40 males and 46 females).
  • transcription of the speech audio.
  • lexicon in the word level.
  • Download link

Tibetan

In the second phase, the Tibetan dataset consists of:

  • 72h speech audio and 147 speakers (66 males and 81 females).
  • transcription of the speech audio.
  • lexicon in the word level.
  • Download link

Mongolian

Coming soon...

Kirghiz

In the second phase, the Kirghiz dataset consists of:

  • 128h speech audio and 163 speakers (100 males and 63 females).
  • transcription of the speech audio.
  • lexicon in the word level.
  • Download link