“CN-Celeb”版本间的差异

2021年1月6日 (三) 10:06的最后版本

Environments: Tensorflow, PyTorch, Keras, MxNet
Face detection and tracking: RetinaFace and ArcFace models.
Active speaker verification: SyncNet model.
Speaker diarization: UIS-RNN model.
Double check by speaker recognition: VGG model.
Input: pictures and videos of POIs (Persons of Interest).
Output: well-labelled videos of POIs (Persons of Interest).

Reports

Stage report v1.0

Publications

@misc{fan2019cnceleb,
  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
  year={2019},
  eprint={1911.01799},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

@misc{li2020cn,
  title={CN-Celeb: multi-genre speaker recognition},
  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
  year={2020},
  eprint={2012.12468},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
 }

Source Code

Collection Pipeline: celebrity-audio-collection
Baseline Systems: kaldi-cn-celeb

Download

Public (recommended)

OpenSLR: http://www.openslr.org/82/

Local (not recommended)

CSLT@Tsinghua: http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/

Future Plans

Augment the database to 10,000 people.
Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.

License

All the resources contained in the database are free for research institutes and individuals.
No commerical usage is permitted.

References

Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [1]
Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [2]
Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [3]
Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[4]
Zhong et al., "GhostVLAD for set-based face recognition", 2018. [5]
Chung et al., "Out of time: automated lip sync in the wild", 2016.[6]
Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [7]
Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [8]

@@ 第1行： / 第1行： @@
-=CN-Celeb=
+===Introduction===
-* A large-scale Chinese celebrities dataset collected `in the wild'.
+* CN-Celeb, a large-scale Chinese celebrities dataset published by Center for Speech and Language Technology (CSLT) at Tsinghua University.
-* Members：Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
-* Historical Members：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
-===Target===
+===Members===
+* Current：Dong Wang, Yunqi Cai, Lantian Li, Yue Fan, Jiawen Kang
+* History：Ziya Zhou, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang
+===Description===
 * Collect audio data of 1,000 Chinese celebrities.
-* Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
+* Automatically clip videos through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
-* Create a database.
+* Create a benchmark database for speaker recognition community.
-===未来计划===
+===Basic Methods===
-* Augment the database to 10,000 people.
+* Environments: Tensorflow, PyTorch, Keras, MxNet
-* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
+* Face detection and tracking: RetinaFace and ArcFace models.
+* Active speaker verification: SyncNet model.
+* Speaker diarization: UIS-RNN model.
+* Double check by speaker recognition: VGG model.
+* Input: pictures and videos of POIs (Persons of Interest).
+* Output: well-labelled videos of POIs (Persons of Interest).
+===Reports===
-===基本方法===
+* [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf Stage report v1.0]
-* Tensorflow, PyTorch, Keras, MxNet 实现
+===Publications===
-* 检测、识别人脸的RetinaFace和ArcFace模型，说话人识别的SyncNet模型，Speaker Diarization的UIS-RNN模型
+<pre>
-* 输入为目标主人公的视频、目标主人公的面部图片
+@misc{fan2019cnceleb,
-* 输出为该视频中主人公声音片段的时间标签
+  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
+  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
+  year={2019},
+  eprint={1911.01799},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+}
+@misc{li2020cn,
+  title={CN-Celeb: multi-genre speaker recognition},
+  author={Lantian Li and Ruiqi Liu and Jiawen Kang and Yue Fan and Hao Cui and Yunqi Cai and Ravichander Vipperla and Thomas Fang Zheng and Dong Wang},
+  year={2020},
+  eprint={2012.12468},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+ }
+</pre>
+===Source Code===
-===项目GitHub地址===
+* Collection Pipeline: [https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
-[https://github.com/celebrity-audio-collection/videoprocess celebrity-audio-collection]
+* Baseline Systems: [https://github.com/csltstu/kaldi/tree/cnceleb/egs/cnceleb kaldi-cn-celeb]
+===Download===
+* Public (recommended)
+OpenSLR: http://www.openslr.org/82/
+* Local (not recommended)
+CSLT@Tsinghua: http://cslt.riit.tsinghua.edu.cn/~data/CN-Celeb/
+===Future Plans===
+* Augment the database to 10,000 people.
+* Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
-===项目报告===
+===License===
-[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:C-STAR.pdf v1.0阶段性报告]
+* All the resources contained in the database are free for research institutes and individuals.
+* <b>No commerical usage is permitted</b>.
+===References===
-===参考文献===
 * Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild", 2019. [https://arxiv.org/pdf/1905.00641.pdf]
 * Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, [https://arxiv.org/abs/1801.07698]
 * Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, [https://arxiv.org/pdf/1801.09414.pdf]
 * Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017[https://arxiv.org/pdf/1704.08063.pdf]
-* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf link]
+* Zhong et al., "GhostVLAD for set-based face recognition", 2018. [http://www.robots.ox.ac.uk/~vgg/publications/2018/Zhong18b/zhong18b.pdf]
-* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf link]
+* Chung et al., "Out of time: automated lip sync in the wild", 2016.[http://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf]
-* Xie et al., "UTTERANCE-LEVEL AGGREGATION FOR SPEAKER RECOGNITION IN THE WILD", 2019. [https://arxiv.org/pdf/1902.10107.pdf link]
+* Xie et al., "Utterance-level Aggregation For Speaker Recognition In The Wild", 2019. [https://arxiv.org/pdf/1902.10107.pdf]
-* Zhang1 et al., "FULLY SUPERVISED SPEAKER DIARIZATION", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf link]
+* Zhang1 et al., "Fully Supervised Speaker Diarization", 2018. [https://arxiv.org/pdf/1810.04719v1.pdf]

“CN-Celeb”版本间的差异

2021年1月6日 (三) 10:06的最后版本

目录

Introduction

Members

Description

Basic Methods

Reports

Publications

Source Code

Download

Future Plans

License

References

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具