“Public data”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第9行: 第9行:
 
CSLT collaborated with the [http://www.xju.edu.cn/ XinJiang University] on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013.  
 
CSLT collaborated with the [http://www.xju.edu.cn/ XinJiang University] on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013.  
  
[http://data.cslt.org/uygh/zip/data.tar.gz download]
+
[http://data.cslt.org/uygh/zip/data.tar.gz download] [http://pan.baidu.com/s/1hqKwE00 download from Baidu]
  
  
第16行: 第16行:
 
A free Cantonese lexicon collected from Adam Sheik's Cantonese Dict project.  
 
A free Cantonese lexicon collected from Adam Sheik's Cantonese Dict project.  
  
[http://data.cslt.org/cantonese/sheik/index.html check details]
+
[http://data.cslt.org/cantonese/sheik/index.html check details] [http://pan.baidu.com/s/1hqKwE00 download from Baidu]  
  
 
== THUYG-20 database ==
 
== THUYG-20 database ==
第22行: 第22行:
 
A free speech database for constructing a full-fledged Uyghur ASR system.  
 
A free speech database for constructing a full-fledged Uyghur ASR system.  
  
[http://data.cslt.org/thuyg20/README.html check details]
+
[http://data.cslt.org/thuyg20/README.html check details] [http://pan.baidu.com/s/1hqKwE00 download from Baidu]
  
  
第29行: 第29行:
 
A set of free databases used for Uyghur research, especially lexical research. Published by Prof. Mijit Ablimit.
 
A set of free databases used for Uyghur research, especially lexical research. Published by Prof. Mijit Ablimit.
  
[http://data.cslt.org/thuyg-tk/index.html check details]
+
[http://data.cslt.org/thuyg-tk/index.html check details] [http://pan.baidu.com/s/1hqKwE00 download from Baidu]
  
 
== SUD-12 database ==
 
== SUD-12 database ==
第35行: 第35行:
 
A speech database used for short utterance speaker recognition
 
A speech database used for short utterance speaker recognition
  
[http://data.cslt.org/susr/SUB12/index.html check details]
+
[http://data.cslt.org/susr/SUB12/index.html check details] [http://pan.baidu.com/s/1hqKwE00 download from Baidu]
  
 
== THUCH30 database ==
 
== THUCH30 database ==
第41行: 第41行:
 
A speech database used for Chinese LVCSR. Recorded by Dong Wang many many years ago.
 
A speech database used for Chinese LVCSR. Recorded by Dong Wang many many years ago.
  
[http://data.cslt.org/thchs30/README.html check details]
+
[http://data.cslt.org/thchs30/README.html check details] [http://pan.baidu.com/s/1hqKwE00 download from Baidu]

2015年10月15日 (四) 02:28的版本

CCC data resource

CSLT holds a close collaboration with Chinese Corpus Consortium (CCC) to collect and publish databases in China. The aim of the CCC is to provide corpora for Chinese ASR, TTS, NLP, perception analysis, phonetics analysis, linguistic analysis, and other related tasks. The corpora can be speech- or text-based; read or spontaneous; wideband or narrowband; standard or dialectal Chinese; clean or with noise; or of any other kinds which are deemed helpful for the foresaid purposes.

visit CCC

Uyghur text database

CSLT collaborated with the XinJiang University on a wide range of research including speech recognition, information retrieval and text processing. We published a multitude of resources to boost the research on Uyghur. The text data published here is used for Uyghur text classification tasks, which involves 500 health and non-health documents respectively. It was collected by Mahpirat from XJU when she visited CSLT from 2012-2013.

download download from Baidu


Sheik Cantonese lexicon

A free Cantonese lexicon collected from Adam Sheik's Cantonese Dict project.

check details download from Baidu

THUYG-20 database

A free speech database for constructing a full-fledged Uyghur ASR system.

check details download from Baidu


THUYG-TK database

A set of free databases used for Uyghur research, especially lexical research. Published by Prof. Mijit Ablimit.

check details download from Baidu

SUD-12 database

A speech database used for short utterance speaker recognition

check details download from Baidu

THUCH30 database

A speech database used for Chinese LVCSR. Recorded by Dong Wang many many years ago.

check details download from Baidu