“C-STAR-database approach”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
CN-Celeb 中华名人音频数据收集
 
(1位用户的7个中间修订版本未显示)
第1行: 第1行:
=C-STAR 中华名人音频数据收集=
+
=CN-Celeb=
  
成员:王东,蔡云麒,周子雅,李开诚,陈浩林,程思潼,张鹏远,范悦
+
* A large-scale Chinese celebrities dataset collected `in the wild'.
 +
* Members:Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
 +
* Historical Members:Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
  
===目标===
+
===Target===
  
 
* Collect audio data of 1,000 Chinese celebrities.
 
* Collect audio data of 1,000 Chinese celebrities.
第31行: 第33行:
  
  
===参与文献===
+
===参考文献===
 
+
* Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf]
 +
* Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]
 +
* Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [https://arxiv.org/pdf/1801.09414.pdf]
 +
* Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[https://arxiv.org/pdf/1704.08063.pdf]
 +
* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
 +
* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
 +
* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
 
* Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]
 
* Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]

2019年10月29日 (二) 11:53的最后版本

CN-Celeb

  • A large-scale Chinese celebrities dataset collected `in the wild'.
  • Members:Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
  • Historical Members:Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang

Target

  • Collect audio data of 1,000 Chinese celebrities.
  • Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
  • Create a database.

未来计划

  • Augment the database to 10,000 people.
  • Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.


基本方法

  • Tensorflow, PyTorch, Keras, MxNet 实现
  • 检测、识别人脸的RetinaFace和ArcFace模型,说话人识别的SyncNet模型,Speaker Diarization的UIS-RNN模型
  • 输入为目标主人公的视频、目标主人公的面部图片
  • 输出为该视频中主人公声音片段的时间标签


项目GitHub地址

celebrity-audio-collection

项目报告

v1.0阶段性报告


参考文献

  • Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
  • Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
  • Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
  • Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
  • Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
  • Chung et al., "Out of time: automated lip sync in the wild", 2016.link
  • Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. link
  • Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. link