“2024-11-25”版本间的差异

2024年11月25日 (一) 11:04的最后版本

People	This Week	Next Week	Task Tracking (DeadLine)
Dong Wang	2nd round of check for AI handbook middle school. Deal with pictures in AI handbook (primary & middle). Start to check AI handbook high school. Check AI book for Tianjin medical school.
Lantian Li	Complete my CSTR Report Go on AI-Graph EN Chapter 4 Polish 2025 Daily Sign
Ying Shi	Design cohort- conditional chain multi-talker ASR with round-RNN WER result : round-1 32.15% , round-2: 69.69% round-3: 92.33% For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance. Prepare for Huawei's interview.
Zhenghai You	Huawei TSE(Train models that better fit the scene)[1]
Junming Yuan	Comparable results between Clean-HuBERT, Cocktail-HuBERT, and MT-HuBERT[2] Bad news: Cocktail-HuBERT > Clean-HuBERT > MT-HuBERT
Chen Chen
Xiaolou Li	Data process CVS3 1/4 already cut from original video, waiting for pre-process Copying pre-processed GongAn video data from gonganbu VSR Contrastive Loss Exp Inspired by paper [3] Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair. Result: Under training Paper Reading
Zehua Liu	Rebutall writing Iterative training and inference Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
Pengqi Li	Begin writing paper about importance of phonemes analysis work. Reading a doctoral thesis about speaker explainability[4].
Wan Lin	NS: all transformer 6k spk: EER 2.6% 20k spk: EER 2.3% 20k spk+multi-enroll: EER 1.9%
Tianhao Wang	Experiments about query embedding conditional approach: SDR: FiLM (7.492) > self-attention (6.573)
Xiaoxue Luo	training of the USS(CED+AudioSep) model adjust the audio format to meet the needs of the model(in training) production of 2025 Daily Sign( March )
Zhenyu Zhou	Speaker identity based conditional chain proposal[5] prepare Interim Report
Junhui Chen	Read paper (ICCIP keynote speak paper and some other) NS Some tests about transformer feature extractor
Jiaying Wang
Yu Zhang	Huawei AED data aug & human annotated dataset [6] Finance Paper reading, reproduce local Llama version of StockAgent [7] (a LLM based market simulation framework)
Wenqiang Du	Training of New language Models(HeNan)[8] Training of New language Models(ChongQing)[9]
Yang Wei	Fix some bugs about keyword sampling in text enroll kws training code. Add spec augmentation for text enroll kws training.
Lily
Turi	Paper reading ICASSP 2025 rebuttal
Yue Gu	Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[10] writing taslp paper
Qi Qu	Finding ideal thresholds and deploying cloud services for KWS models: `zh48_guangdong` and `zh48_haining20`. Located and fixed a bug in FunASR which may lead to segmentation fault. Built service with extended gRPC protocol. Analysis of some AED (cries and slaps) FAs.

@@ 第33行： / 第33行： @@
 |Ying Shi
 ||
-*
+* Design cohort- conditional chain multi-talker ASR with round-RNN
+** WER result :  round-1 32.15% , round-2: 69.69% round-3: 92.33%
+** For 500 utterances sub-test set: Only 28% of the sentences have a recognition order that matches the cosine distance.
+* Prepare for Huawei's interview.
 ||
 *
@@ 第44行： / 第47行： @@
 |Zhenghai You
 ||
-*
+* Huawei TSE(Train models that better fit the scene)[https://z1et6d3xtb.feishu.cn/docx/AArOdQEQPoFcshxD5OfcB9SLnFg]
 ||
 *
@@ 第77行： / 第80行： @@
 |Xiaolou Li
 ||
-*
+* Data process
+** CVS3 1/4 already cut from original video, waiting for pre-process
+** Copying pre-processed GongAn video data from gonganbu
+* VSR Contrastive Loss Exp
+** Inspired by paper [https://arxiv.org/abs/2408.11813]
+** Main idea: For better align visual feature to LLM input, calculate cos similarity of target and video feature, set the biggest as the positive pair.
+** Result: Under training
+* Paper Reading
 ||
 *
@@ 第89行： / 第99行： @@
 ||
 *Rebutall writing
-*Iterative training
+*Iterative training and inference
-**Baseline(45.99%) < Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
+**Iter-1(45.53%) < Iter-2(45.00%) < Iter-3(44.85%)
 ||
 *
@@ 第102行： / 第112行： @@
 ||
 * Begin writing paper about importance of phonemes analysis work.
-* Reading a doctoral thesis on speaker explainability[https://theses.hal.science/tel-04634215v1/file/These_BEN_AMOR.pdf].
+* Reading a doctoral thesis about speaker explainability[https://theses.hal.science/tel-04634215v1/file/These_BEN_AMOR.pdf].
 ||
@@ 第114行： / 第124行： @@
 |Wan Lin
 ||
-*
+* NS: all transformer
+** 6k spk: EER 2.6%
+** 20k spk: EER 2.3%
+** 20k spk+multi-enroll: EER 1.9%
 ||
 *
@@ 第124行： / 第137行： @@
 |-
 |Tianhao Wang
+||
+* Experiments about query embedding conditional approach:
+** SDR: FiLM (7.492) > self-attention (6.573)
 ||
 *
+||
+*
+|-
+|-
+|Xiaoxue Luo
+||
+* training of the USS(CED+AudioSep) model
+** adjust the audio format to meet the needs of the model(in training)
+* production of 2025 Daily Sign( March )
 ||
 *
@@ 第136行： / 第163行： @@
 |Zhenyu Zhou
 ||
-*
+*Speaker identity based conditional chain proposal[https://z1et6d3xtb.feishu.cn/docx/MzZ8d3cDWokCzCx0MmDcRJDFnke]
+*prepare Interim Report
 ||
 *
@@ 第147行： / 第175行： @@
 |Junhui Chen
 ||
-*
+* Read paper (ICCIP keynote speak paper and some other)
+* NS
+** Some tests about transformer feature extractor
 ||
 *
@@ 第196行： / 第226行： @@
 |Yang Wei
 ||
-*
+* Fix some bugs about keyword sampling in text enroll kws training code.
+* Add spec augmentation for text enroll kws training.
 ||
 *
@@ 第225行： / 第256行： @@
 |Yue Gu
 ||
-*
+* Synthesis about 1h data for each target speaker, then using these data to train the adapter module.[https://z1et6d3xtb.feishu.cn/wiki/VPZfwx53ei2zkgkSvPtcCiDSnVh?from=from_copylink]
+* writing taslp paper
 ||
 *

“2024-11-25”版本间的差异

2024年11月25日 (一) 11:04的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具