“2024-09-30”版本间的差异

2024年10月7日 (一) 08:53的最后版本

People	This Week	Next Week	Task Tracking (DeadLine)
Dong Wang	AI graph (high education version)
Lantian Li	AI-Graph handbook v0.1 AI-Graph EN (12/50) Huawei TiDing 3.0 - Model Quantization BUPT/AI-Radiance trivial things
Ying Shi	Add 4 kinds of negative sampling strategies Optimized Text-enroll KWS code (deletion, substitution, insertion, and shuffle) and verify them to ensure no bugs. Find that new negative sampling will increase the difficulty of training which indicates that only depending on positional embedding is not enough. Reproduce conditional chain overlap asr (Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals) According to Jiaying's work the code released by the published paper can not work Write dominance-based conditional chain overlap asr by myself (in progress)
Zhenghai You	Exploring the role of speaker encoder in TSE[1] Joint traing Spk Enc have better separation effect, but the EER is poor Pretrain & Freezing Spk Enc EER well, but SI-SDR is poor Further explore the different impacts of using spk aug on different tasks The generality of SPK-AUG Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
Junming Yuan
Chen Chen
Xiaolou Li	Use MFA on LRS3 to cut it into small segments Use discrete embedding of avhubert in vsp-llm training (Still training) Some idea of align video feature and LLM (Dense Connector, CL methods) Handover the data collection and get familiar with the process Data Collection: 3138 h (need to re-check, DDL: 10.15)
Zehua Liu	Baseline System VSP-LLM Try Qwen2.5-14B[2]
Pengqi Li
Wan Lin	Voxblink1 model training and testing [3]
Tianhao Wang	AudioSep reproduction problem: LAION CLAP needs 48kHz audio so the data needs to be up-resample
Xiaoxue Luo	AI-Graph High school handbook(v0.1)
Zhenyu Zhou	Model Quantization document submit Review conditional chain code
Junhui Chen	Voxblink1 model training and testing Writing test code for NS in ossi test.
Jiaying Wang
Yu Zhang	Fri Report Change SocioDojo Agent from ChatGPT-3.5-Turbo to Llama-3.1-8B (still working)
Wenqiang Du	Check primary school handbook(43/45) Release chinese and haining KWS model
Yang Wei
Lily	APSIPA workshop Tianjin and Prepare Friday's report Prepare for online-course AI radiance's daily work
Turi	Segmented audios in dataset into individual words. Paper reading
Yue Gu	Almost complete the revisions of my journal paper
Qi Qu	KWS Testing zh48 models on dataset of Mandarin Chinese w/ Guangdong accent: recall drops significantly. AED Evaluating third-party solution of baby crying detection. Misc. Preparing for live talk.

@@ 第6行： / 第6行： @@
 |Dong Wang
 ||
-*
+* AI graph (high education version)
 ||
 *
@@ 第47行： / 第47行： @@
 |Zhenghai You
 ||
-*
+* Exploring the role of speaker encoder in TSE[https://z1et6d3xtb.feishu.cn/docx/GHF8doRjDo50ihxGUPpcsZgLncb]
+** Joint traing Spk Enc have better separation effect, but the EER is poor
+** Pretrain & Freezing Spk Enc EER well, but SI-SDR is poor
+** Further explore the different impacts of using spk aug on different tasks
+* The generality of SPK-AUG
+** Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
 ||
 *
@@ 第79行： / 第84行： @@
 |Xiaolou Li
 ||
-*
+* Use MFA on LRS3 to cut it into small segments
+* Use discrete embedding of avhubert in vsp-llm training (Still training)
+* Some idea of align video feature and LLM (Dense Connector, CL methods)
+* Handover the data collection and get familiar with the process
+* Data Collection: 3138 h (need to re-check, DDL: 10.15)
 ||
 *
@@ 第90行： / 第99行： @@
 |Zehua Liu
 ||
-*
+*Baseline System VSP-LLM
+*Try Qwen2.5-14B[https://z1et6d3xtb.feishu.cn/docx/JBsidACDVojhCaxFQLbcCVbsnAc?from=from_copylink]
 ||
 *
@@ 第122行： / 第132行： @@
 |-
 |Tianhao Wang
+||
+* AudioSep reproduction
+** problem: LAION CLAP needs 48kHz audio so the data needs to be up-resample
 ||
 *
+||
+*
+|-
+|-
+|Xiaoxue Luo
+||
+*AI-Graph High school handbook(v0.1)
 ||
 *
@@ 第134行： / 第156行： @@
 |Zhenyu Zhou
 ||
-*
+* Model Quantization document submit
+* Review conditional chain code
 ||
 *
@@ 第145行： / 第168行： @@
 |Junhui Chen
 ||
-*
+* Voxblink1 model training and testing
+** Writing test code for NS in ossi test.
 ||
 *
@@ 第167行： / 第191行： @@
 |Yu Zhang
 ||
-*
+* Fri Report
+* Change SocioDojo Agent from ChatGPT-3.5-Turbo to Llama-3.1-8B (still working)
 ||
 *
@@ 第178行： / 第203行： @@
 |Wenqiang Du
 ||
-*
+*Check primary school handbook(43/45)
+*Release chinese and haining KWS model
 ||
 *
@@ 第199行： / 第225行： @@
 |Lily
 ||
-*
+* APSIPA workshop Tianjin and Prepare Friday's report
+* Prepare for online-course
+* AI radiance's daily work
 ||
 *
@@ 第209行： / 第237行： @@
 |Turi
 ||
-*
+* Segmented audios in dataset into individual words.
+* Paper reading
 ||
 *
@@ 第226行： / 第255行： @@
 |Qi Qu
 ||
-*
+* KWS
+** Testing zh48 models on dataset of Mandarin Chinese w/ Guangdong accent: recall drops significantly.
+* AED
+** Evaluating third-party solution of baby crying detection.
+* Misc.
+** Preparing for live talk.
 ||
 *

“2024-09-30”版本间的差异

2024年10月7日 (一) 08:53的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具