2019年10月29日 (二) 11:53的最后版本

CN-Celeb

A large-scale Chinese celebrities dataset collected `in the wild'.
Members：Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
Historical Members：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang

Target

Collect audio data of 1,000 Chinese celebrities.
Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
Create a database.

未来计划

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

基本方法

Tensorflow, PyTorch, Keras, MxNet 实现
检测、识别人脸的RetinaFace和ArcFace模型，说话人识别的SyncNet模型，Speaker Diarization的UIS-RNN模型
输入为目标主人公的视频、目标主人公的面部图片
输出为该视频中主人公声音片段的时间标签

项目GitHub地址

celebrity-audio-collection

项目报告

v1.0阶段性报告

参考文献

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
Chung et al., "Out of time: automated lip sync in the wild", 2016.link
Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. link
Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link

@@ 第1行： / 第1行： @@
-=C-STAR 中华名人音频数据收集=
+=CN-Celeb=
-成员：王东，蔡云麒，周子雅，李开诚，陈浩林，程思潼，张鹏远，范悦
+* A large-scale Chinese celebrities dataset collected `in the wild'.
+* Members：Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
+* Historical Members：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
-===目标===
+===Target===
 * Collect audio data of 1,000 Chinese celebrities.
@@ 第31行： / 第33行： @@
-===参与文献===
+===参考文献===
+* Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf]
+* Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]
+* Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [https://arxiv.org/pdf/1801.09414.pdf]
+* Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[https://arxiv.org/pdf/1704.08063.pdf]
+* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
+* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
+* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
 * Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]

“C-STAR-database approach”版本间的差异

2019年10月29日 (二) 11:53的最后版本

目录

CN-Celeb

Target

未来计划

基本方法

项目GitHub地址

项目报告

参考文献

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具