“C-STAR-database approach”版本间的差异

来自cslt Wiki

跳转至：导航、搜索

2019年8月22日 (四) 00:53的版本

目录

1 C-STAR 中华名人音频数据收集

C-STAR 中华名人音频数据收集

成员：王东，蔡云麒，周子雅，李开诚，陈浩林，程思潼，张鹏远，范悦

目标

Collect audio data of 1,000 Chinese celebrities.
Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
Create a database.

未来计划

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

基本方法

Tensorflow, PyTorch, Keras, MxNet 实现
检测、识别人脸的RetinaFace和ArcFace模型，说话人识别的SyncNet模型，Speaker Diarization的UIS-RNN模型
输入为目标主人公的视频、目标主人公的面部图片
输出为该视频中主人公声音片段的时间标签

项目GitHub地址

celebrity-audio-collection

项目报告

v1.0阶段性报告

参与文献

Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link

取自“http://index.cslt.org/mediawiki/index.php?title=C-STAR-database_approach&oldid=33822”