C-STAR-database approach

来自cslt Wiki

2019年8月22日 (四) 00:53Cslt（讨论 | 贡献）的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

跳转至：导航、搜索

目录

1 C-STAR 中华名人音频数据收集

C-STAR 中华名人音频数据收集

成员：王东，蔡云麒，周子雅，李开诚，陈浩林，程思潼，张鹏远，范悦

目标

Collect audio data of 1,000 Chinese celebrities.
Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
Create a database.

未来计划

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

基本方法

Tensorflow, PyTorch, Keras, MxNet 实现
检测、识别人脸的RetinaFace和ArcFace模型，说话人识别的SyncNet模型，Speaker Diarization的UIS-RNN模型
输入为目标主人公的视频、目标主人公的面部图片
输出为该视频中主人公声音片段的时间标签

项目GitHub地址

celebrity-audio-collection

项目报告

v1.0阶段性报告

参与文献

Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link

取自“http://index.cslt.org/mediawiki/index.php?title=C-STAR-database_approach&oldid=33822”