“OC17-data”版本间的差异

2017年4月20日 (四) 07:28的版本

Data allowed to use

The MixASR-CHEN 17 allows the following data resources to be used:

Training data: OC16-CE80 training/dev set + THCHS30
Development data: OC16-CE80 test set
Test data: OC17-CE10 test set
Lexicon: THCHS30 Chinese lexicon + CMU English lexicon
Additional word list: An additional English word list OC17-EnWord that covers most of the English OOVs in the test set. However, no phone transcriptions are available.
LM: THCHS30 LM can be used, but all the transcriptions of OC16-CE80 training/dev/test and THCHS30 can be used to improve the basic LM.

OC16-CE80

OC16-CE80 is a speech database provided by SpeechOcean (http://www.speechocean.com) for this challenge. The main features involve:

1400+ speakers Mobile channel 80 hours of speech signals Transcriptions are provided The licence file is here Data profile is here

OC17-CE10

OC17-CE10 is a speech database provided by SpeechOcean (http://www.speechocean.com) for this challenge. The main features involve:

100+ speakers Mobile channel 10 hours of speech signals Transcriptions are provided The licence file is here

THCHS30

THCHS30 is a Chinese speech database provided by CSLT@Tsinghua University. All the resources of THCHS30 can be used to improve the system, especially the lexicon and LM. The data is available at:

http://www.openslr.org/18/

CMU English dictionary

To recognize English words, CMU English dictionary 0.7b is allowed to be used.

http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b

@@ 第29行： / 第29行： @@
 hours of speech signals
 Transcriptions are provided
-The licence file is [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/OC17-CE10 here]
+The licence file is [[OC17-CE10|here]]
 ==THCHS30==

“OC17-data”版本间的差异

2017年4月20日 (四) 07:28的版本

目录

Data allowed to use

OC16-CE80

OC17-CE10

THCHS30

CMU English dictionary

CMU English dictionary

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具