“2024-09-30”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(11位用户的15个中间修订版本未显示)
第6行: 第6行:
 
|Dong Wang
 
|Dong Wang
 
||
 
||
*
+
* AI graph (high education version)
 
||
 
||
 
*
 
*
第47行: 第47行:
 
|Zhenghai You
 
|Zhenghai You
 
||
 
||
*
+
* Exploring the role of speaker encoder in TSE[https://z1et6d3xtb.feishu.cn/docx/GHF8doRjDo50ihxGUPpcsZgLncb]
 +
** Joint traing Spk Enc have better separation effect, but the EER is poor
 +
** Pretrain & Freezing Spk Enc EER well, but SI-SDR is poor
 +
** Further explore the different impacts of using spk aug on different tasks
 +
* The generality of SPK-AUG
 +
** Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
 
||
 
||
 
*
 
*
第79行: 第84行:
 
|Xiaolou Li
 
|Xiaolou Li
 
||
 
||
*
+
* Use MFA on LRS3 to cut it into small segments
 +
* Use discrete embedding of avhubert in vsp-llm training (Still training)
 +
* Some idea of align video feature and LLM (Dense Connector, CL methods)
 +
* Handover the data collection and get familiar with the process
 +
* Data Collection: 3138 h (need to re-check, DDL: 10.15)
 
||
 
||
 
*
 
*
第90行: 第99行:
 
|Zehua Liu
 
|Zehua Liu
 
||
 
||
*
+
*Baseline System VSP-LLM
 +
*Try Qwen2.5-14B[https://z1et6d3xtb.feishu.cn/docx/JBsidACDVojhCaxFQLbcCVbsnAc?from=from_copylink]
 
||
 
||
 
*
 
*
第122行: 第132行:
 
|-
 
|-
 
|Tianhao Wang
 
|Tianhao Wang
 +
||
 +
* AudioSep reproduction
 +
** problem: LAION CLAP needs 48kHz audio so the data needs to be up-resample
 
||
 
||
 
*
 
*
 +
||
 +
*
 +
|-
 +
 +
 +
|-
 +
|Xiaoxue Luo
 +
||
 +
*AI-Graph High school handbook(v0.1)
 
||
 
||
 
*
 
*
第134行: 第156行:
 
|Zhenyu Zhou
 
|Zhenyu Zhou
 
||
 
||
*
+
* Model Quantization document submit
 +
* Review conditional chain code
 
||
 
||
 
*
 
*
第145行: 第168行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* Voxblink1 model training and testing
 +
** Writing test code for NS in ossi test.
 
||
 
||
 
*
 
*
第167行: 第191行:
 
|Yu Zhang
 
|Yu Zhang
 
||
 
||
*
+
* Fri Report
 +
* Change SocioDojo Agent from ChatGPT-3.5-Turbo to Llama-3.1-8B (still working)
 
||
 
||
 
*
 
*
第178行: 第203行:
 
|Wenqiang Du
 
|Wenqiang Du
 
||
 
||
*
+
*Check primary school handbook(43/45)
 +
*Release chinese and haining KWS model
 
||
 
||
 
*
 
*
第199行: 第225行:
 
|Lily
 
|Lily
 
||
 
||
*
+
* APSIPA workshop Tianjin and Prepare Friday's report
 +
* Prepare for online-course
 +
* AI radiance's daily work
 
||
 
||
 
*
 
*
第209行: 第237行:
 
|Turi
 
|Turi
 
||
 
||
*
+
* Segmented audios in dataset into individual words.
 +
* Paper reading
 
||
 
||
 
*
 
*
第226行: 第255行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* KWS
 +
** Testing zh48 models on dataset of Mandarin Chinese w/ Guangdong accent: recall drops significantly.
 +
* AED
 +
** Evaluating third-party solution of baby crying detection.
 +
* Misc.
 +
** Preparing for live talk.
 
||
 
||
 
*
 
*

2024年10月7日 (一) 08:53的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • AI graph (high education version)
Lantian Li
  • AI-Graph handbook v0.1
  • AI-Graph EN (12/50)
  • Huawei TiDing 3.0 - Model Quantization
  • BUPT/AI-Radiance trivial things
Ying Shi
  • Add 4 kinds of negative sampling strategies Optimized Text-enroll KWS code
    • (deletion, substitution, insertion, and shuffle) and verify them to ensure no bugs.
    • Find that new negative sampling will increase the difficulty of training which indicates that only depending on positional embedding is not enough.
  • Reproduce conditional chain overlap asr (Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals)
    • According to Jiaying's work the code released by the published paper can not work
    • Write dominance-based conditional chain overlap asr by myself (in progress)
Zhenghai You
  • Exploring the role of speaker encoder in TSE[1]
    • Joint traing Spk Enc have better separation effect, but the EER is poor
    • Pretrain & Freezing Spk Enc EER well, but SI-SDR is poor
    • Further explore the different impacts of using spk aug on different tasks
  • The generality of SPK-AUG
    • Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
Junming Yuan
Chen Chen
Xiaolou Li
  • Use MFA on LRS3 to cut it into small segments
  • Use discrete embedding of avhubert in vsp-llm training (Still training)
  • Some idea of align video feature and LLM (Dense Connector, CL methods)
  • Handover the data collection and get familiar with the process
  • Data Collection: 3138 h (need to re-check, DDL: 10.15)
Zehua Liu
  • Baseline System VSP-LLM
  • Try Qwen2.5-14B[2]
Pengqi Li
Wan Lin
  • Voxblink1 model training and testing [3]
Tianhao Wang
  • AudioSep reproduction
    • problem: LAION CLAP needs 48kHz audio so the data needs to be up-resample
Xiaoxue Luo
  • AI-Graph High school handbook(v0.1)
Zhenyu Zhou
  • Model Quantization document submit
  • Review conditional chain code
Junhui Chen
  • Voxblink1 model training and testing
    • Writing test code for NS in ossi test.
Jiaying Wang
Yu Zhang
  • Fri Report
  • Change SocioDojo Agent from ChatGPT-3.5-Turbo to Llama-3.1-8B (still working)
Wenqiang Du
  • Check primary school handbook(43/45)
  • Release chinese and haining KWS model
Yang Wei
Lily
  • APSIPA workshop Tianjin and Prepare Friday's report
  • Prepare for online-course
  • AI radiance's daily work
Turi
  • Segmented audios in dataset into individual words.
  • Paper reading
Yue Gu
  • Almost complete the revisions of my journal paper
Qi Qu
  • KWS
    • Testing zh48 models on dataset of Mandarin Chinese w/ Guangdong accent: recall drops significantly.
  • AED
    • Evaluating third-party solution of baby crying detection.
  • Misc.
    • Preparing for live talk.