OC17-plan

OC17-CHEN aims at boosting technologies for mixlingual speech recognition, with the particular focus on Chinese ASR with English words involved. In both the training and test data, the main body of the data are Chinese, while some English words scatter in the sentences. Many of the English words are commonly used, but some of them are rare and not in the vocabulary of THCHS30 and CMU dict. Participants are required to build an ASR system with the provided resources that can decode both Chinese and English, with English words as a particular focus.

Test rules

Constraint on resource

The only constraint is that only the resources listed in the data profile can be used in the system development.

Constraint on technology

We do not want to set much limitations on what techniques can be used by participants. Any technologies on front-end, acoustic model, language model, decoding are applicable. For example, one may want to use some speech enhancement methods to improve quality of the signal, while others may want to train a high-order LM.

Constraint on tools

Any tools can be used, however, we highly recommend tools publicly available. The tools (internal or external) should not use any heavy models trained using extra data. The only exception is the G2P conversion for the English words in OC17-EnWord. You can use some online services or third-party G2P tools that have been trained using their own database, only that they are publicly available. We think this simulates the real scenario when new English words are encountered and are required to be handled.

Constraint on system

Constraint on system. Although any open-source tools can be used to construct the system, participants are encouraged to use the Kaldi baseline provided by the organizer and augment their own novel techniques, so that the community will get more insight what techniques really help.

Evaluation metric

The performance of the submission is evaluated in terms of WER(Word error rate). Each Chinese character is treated as a word.
The major metric is the over-all WER, which evaluates the entire transcription, including both English and Chinese parts.
The second metric is the English WER, which evaluates the English part.