|
|
第49行: |
第49行: |
| * Exploring the role of speaker encoder in TSE[https://z1et6d3xtb.feishu.cn/docx/GHF8doRjDo50ihxGUPpcsZgLncb] | | * Exploring the role of speaker encoder in TSE[https://z1et6d3xtb.feishu.cn/docx/GHF8doRjDo50ihxGUPpcsZgLncb] |
| ** Joint traing Spk Enc have better separation effect, but the EER is poor | | ** Joint traing Spk Enc have better separation effect, but the EER is poor |
− | ** Pretrain & Freeing Spk Enc EER well, but SI-SDR is poor | + | ** Pretrain & Freezing Spk Enc EER well, but SI-SDR is poor |
| ** Further explore the different impacts of using spk aug on different tasks | | ** Further explore the different impacts of using spk aug on different tasks |
| * The generality of SPK-AUG | | * The generality of SPK-AUG |
People |
This Week |
Next Week |
Task Tracking (DeadLine)
|
Dong Wang
|
|
|
|
Lantian Li
|
- AI-Graph handbook v0.1
- AI-Graph EN (12/50)
- Huawei TiDing 3.0 - Model Quantization
- BUPT/AI-Radiance trivial things
|
|
|
Ying Shi
|
- Add 4 kinds of negative sampling strategies Optimized Text-enroll KWS code
- (deletion, substitution, insertion, and shuffle) and verify them to ensure no bugs.
- Find that new negative sampling will increase the difficulty of training which indicates that only depending on positional embedding is not enough.
- Reproduce conditional chain overlap asr (Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals)
- According to Jiaying's work the code released by the published paper can not work
- Write dominance-based conditional chain overlap asr by myself (in progress)
|
|
|
Zhenghai You
|
- Exploring the role of speaker encoder in TSE[1]
- Joint traing Spk Enc have better separation effect, but the EER is poor
- Pretrain & Freezing Spk Enc EER well, but SI-SDR is poor
- Further explore the different impacts of using spk aug on different tasks
- The generality of SPK-AUG
- Refactored DPRNN-TSE results are reliable and have been accelerated from 87 hours to 32 hours
|
|
|
Junming Yuan
|
|
|
|
Chen Chen
|
|
|
|
Xiaolou Li
|
- Use MFA on LRS3 to cut it into small segments
- Use discrete embedding of avhubert in vsp-llm training (Still training)
- Some idea of align video feature and LLM (Dense Connector, CL methods)
- Handover the data collection and get familiar with the process
- Data Collection: 3138 h (need to re-check, DDL: 10.15)
|
|
|
Zehua Liu
|
|
|
|
Pengqi Li
|
|
|
|
Wan Lin
|
- Voxblink1 model training and testing [2]
|
|
|
Tianhao Wang
|
- AudioSep reproduction
- problem: LAION CLAP needs 48kHz audio so the data needs to be up-resample
|
|
|
Xiaoxue Luo
|
- AI-Graph High school handbook(v0.1)
|
|
|
Zhenyu Zhou
|
- Model Quantization document submit
- Review conditional chain code
|
|
|
Junhui Chen
|
|
|
|
Jiaying Wang
|
|
|
|
Yu Zhang
|
- Fri Report
- Change SocioDojo Agent from ChatGPT-3.5-Turbo to Llama-3.1-8B (still working)
|
|
|
Wenqiang Du
|
- Check primary school handbook(43/45)
- Release chinese and haining KWS model
|
|
|
Yang Wei
|
|
|
|
Lily
|
- APSIPA workshop Tianjin and Prepare Friday's report
- Prepare for online-course
- AI radiance's daily work
|
|
|
Turi
|
- Segmented audios in dataset into individual words.
- Paper reading
|
|
|
Yue Gu
|
- Almost complete the revisions of my journal paper
|
|
|
Qi Qu
|
- KWS
- Testing zh48 models on dataset of Mandarin Chinese w/ Guangdong accent: recall drops significantly.
- AED
- Evaluating third-party solution of baby crying detection.
- Misc.
|
|
|