2019年8月22日 (四) 00:58的版本

C-STAR 中华名人音频数据收集

成员：王东，蔡云麒，周子雅，李开诚，陈浩林，程思潼，张鹏远，范悦

Collect audio data of 1,000 Chinese celebrities.
Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
Create a database.

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [1]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [2]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[3]

Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. link
Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link

@@ 第31行： / 第31行： @@
-===参与文献===
+===参考文献===
+* Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]
+* Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [https://arxiv.org/pdf/1801.09414.pdf]
+* Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[https://arxiv.org/pdf/1704.08063.pdf]
+* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
+* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
+* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
 * Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]