“C-STAR-database approach”版本间的差异
来自cslt Wiki
Chenhaolin(讨论 | 贡献) (以“=C-STAR 中华名人音频数据收集= 成员:王东,蔡云麒,周子雅,李开诚,陈浩林,程思潼,张鹏远,范悦 ===目标=== * Collect au...”为内容创建页面) |
|||
第28行: | 第28行: | ||
===项目报告=== | ===项目报告=== | ||
[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf v1.0阶段性报告] | [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf v1.0阶段性报告] | ||
+ | |||
+ | |||
+ | |||
+ | ===参与文献=== | ||
+ | |||
+ | * Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link] |
2019年8月22日 (四) 00:53的版本
C-STAR 中华名人音频数据收集
成员:王东,蔡云麒,周子雅,李开诚,陈浩林,程思潼,张鹏远,范悦
目标
- Collect audio data of 1,000 Chinese celebrities.
- Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
- Create a database.
未来计划
- Augment the database to 10,000 people.
- Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
基本方法
- Tensorflow, PyTorch, Keras, MxNet 实现
- 检测、识别人脸的RetinaFace和ArcFace模型,说话人识别的SyncNet模型,Speaker Diarization的UIS-RNN模型
- 输入为目标主人公的视频、目标主人公的面部图片
- 输出为该视频中主人公声音片段的时间标签
项目GitHub地址
项目报告
参与文献
- Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link