“C-STAR-database approach”版本间的差异
来自cslt Wiki
第1行: | 第1行: | ||
=C-STAR 中华名人音频数据收集= | =C-STAR 中华名人音频数据收集= | ||
− | 成员:王东,蔡云麒,李蓝天,范悦,康嘉文 | + | * 成员:王东,蔡云麒,李蓝天,范悦,康嘉文 |
− | 历史成员:周子雅,李开诚,陈浩林,程思潼,张鹏远 | + | * 历史成员:周子雅,李开诚,陈浩林,程思潼,张鹏远 |
===目标=== | ===目标=== |
2019年10月23日 (三) 23:11的版本
C-STAR 中华名人音频数据收集
- 成员:王东,蔡云麒,李蓝天,范悦,康嘉文
- 历史成员:周子雅,李开诚,陈浩林,程思潼,张鹏远
目标
- Collect audio data of 1,000 Chinese celebrities.
- Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
- Create a database.
未来计划
- Augment the database to 10,000 people.
- Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
基本方法
- Tensorflow, PyTorch, Keras, MxNet 实现
- 检测、识别人脸的RetinaFace和ArcFace模型,说话人识别的SyncNet模型,Speaker Diarization的UIS-RNN模型
- 输入为目标主人公的视频、目标主人公的面部图片
- 输出为该视频中主人公声音片段的时间标签
项目GitHub地址
项目报告
参考文献
- Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
- Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
- Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
- Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
- Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
- Chung et al., "Out of time: automated lip sync in the wild", 2016.link
- Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. link
- Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link