“CN-CVS”版本间的差异

2022年10月25日 (二) 05:09的版本

Collect audio and video data of more than 2500 Mandarin speakers.
Automatically clip videos through a pipeline including shot detection, VAD, face detection, face tracker, audio-visual synchronization detection.
Manually annotate speaker identity, human check data quality.
Create a benchmark database for video to speech synthesis task.

TODO

TODO

All the resources contained in the database are free for research institutes and individuals.
No commerical usage is permitted.

@@ 第43行： / 第43行： @@
 ===Future Plans===
+* Extract text transcription via OCR & ASR & Human check
 ===License===