“2024-10-28”版本间的差异

2024年10月28日 (一) 10:59的最后版本

People	This Week	Next Week
Dong Wang	AI primary book done
Lantian Li	AI-Graph EN (1-20 finalized) Design 2025 Daily Posts
Ying Shi	revise the code about cohort-overlap asr [the training is in progress] Support arbitrary source mixing training Use the real hypothesis as condition by Token error rate Design stop criterion
Zhenghai You	Introduce more hard samples to improve model performance[1] SPK-AUG with same length: There is an improvement, but the SI-SDR decreases when hard sample rate increases Design more hard samples
Junming Yuan	The result of time-mask MT-HuBERT [2] A sad news
Chen Chen
Xiaolou Li	VTS with LLM structure design and baseline code writing [3]
Zehua Liu	Reading Papper about In-Context-Learning in ASR Training model with Adaptive Time Mask Try In-Context-Learning with only previous sentence[4] VTS Project Report starts
Pengqi Li	Consistency of TAO and LayerCAM Change TAO from input to final conv layer and obtain more consistency.(Aishell:0.93 in any model)
Wan Lin	NS: downsampling is not useful [5] share speaker meeting in Friday
Tianhao Wang	AudioSep (CLAP) 5-mix exps[6]: text-query: SDR=4.978, SI-SDR=1.972 audio-query: SDR=6.907, SI-SDR=5.058 This results with the loudness limitation	AudioSep (CLAP) without loudness limitation Project things
Xiaoxue Luo	Comparative experiment between AudioSep and baseline system(CLIPSep) Prepare the report
Zhenyu Zhou	reproduce 5-mix speech Separation results： pit：2-mix：16.04 ；5-mix：6.87 conditional：5-mix：5.38（40 epoch）
Junhui Chen	NS：speaker detection （method survey & debug） get sick
Jiaying Wang
Yu Zhang	SocioDojo (still worse than Nasdaq100 baseline) Change information sources, from the perspective of the report generated by LLM, more new information sources will be referenced. Prompt Actuator to consider current cash ratio before investing (with out this, the asset ratio goes up to 100%, which leads to high risks, still running) Read some papers about integrating time series into LLM
Wenqiang Du	Prepare data,code and environment for Pro.Mijiti
Yang Wei	Train text enroll KWS model with Aibabel training data. Not work.
Lily
Turi	Whisper-largev3 finetuning Freezing 20 layers of encoder achieved 9.75 WER. Vanilla finetuning 8.02 WER
Yue Gu	seek suggestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues.
Qi Qu	KWS: Text-enroll models exported to ONNX. C/JNI libs built based on ONNX models and ready for on-device test.

@@ 第6行： / 第6行： @@
 |Dong Wang
 ||
-*
+* AI primary book done
 ||
 *
@@ 第17行： / 第17行： @@
 |Lantian Li
 ||
-*
+* AI-Graph EN (1-20 finalized)
+* Design 2025 Daily Posts
 ||
 *
@@ 第28行： / 第29行： @@
 |Ying Shi
 ||
-*
+* revise the code about cohort-overlap asr [the training is in progress]
+** Support arbitrary source mixing training
+** Use the real hypothesis as condition by Token error rate
+** Design stop criterion
 ||
 *
@@ 第39行： / 第43行： @@
 |Zhenghai You
 ||
-*
+* Introduce more hard samples to improve model performance[https://z1et6d3xtb.feishu.cn/docx/CURxdy3tEorxkrxtjjqcdMaYnJg]
+** SPK-AUG with same length: There is an improvement, but the SI-SDR decreases when hard sample rate increases
+** Design more hard samples
 ||
 *
@@ 第72行： / 第78行： @@
 |Xiaolou Li
 ||
-*
+* VTS with LLM structure design and baseline code writing [https://z1et6d3xtb.feishu.cn/docx/ZBnOdEMxgo8bs5xrkb1cPZnCnQg?from=from_copylink]
 ||
 *
@@ 第109行： / 第115行： @@
 |Wan Lin
 ||
-*
+* NS: downsampling is not useful [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink]
+* share speaker meeting in Friday
 ||
 *
@@ 第120行： / 第127行： @@
 |Tianhao Wang
 ||
-* AudioSep (CLAP) 5-mix exps:
+* AudioSep (CLAP) 5-mix exps[https://z1et6d3xtb.feishu.cn/docx/DlR8dZRdEoZIwIxTOFvcQdbGnqg]:
 ** text-query: SDR=4.978, SI-SDR=1.972
 ** audio-query: SDR=6.907, SI-SDR=5.058
@@ 第147行： / 第154行： @@
 |Zhenyu Zhou
 ||
-*
+*reproduce 5-mix speech Separation results：
+**pit：2-mix：16.04 ；5-mix：6.87
+**conditional：5-mix：5.38（40 epoch）
 ||
 *
@@ 第158行： / 第167行： @@
 |Junhui Chen
 ||
-*
+* NS：speaker detection （method survey & debug）
+* get sick
 ||
 *
@@ 第205行： / 第215行： @@
 |Yang Wei
 ||
-*
+* Train text enroll KWS model with Aibabel training data. Not work.
 ||
 *
@@ 第225行： / 第235行： @@
 |Turi
 ||
-*
+* Whisper-largev3 finetuning
+** Freezing 20 layers of encoder achieved 9.75 WER. Vanilla finetuning 8.02 WER
 ||
 *
@@ 第233行： / 第244行： @@
 |Yue Gu
 ||
-* seek sugestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues.
+* seek suggestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues.
 ||
 *
@@ 第242行： / 第253行： @@
 |Qi Qu
 ||
-*
+* KWS:
+** Text-enroll models exported to ONNX.
+** C/JNI libs built based on ONNX models and ready for on-device test.
 ||
 *

“2024-10-28”版本间的差异

2024年10月28日 (一) 10:59的最后版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具