|
|
(3位用户的4个中间修订版本未显示) |
第82行: |
第82行: |
| |Xiaolou Li | | |Xiaolou Li |
| || | | || |
− | * | + | * AV-HuBERT discrete unit training (wer: ↓1.5-3%) |
| + | ** rethink how to prove the advantage or disadvantage of discrete unit? |
| + | * Dense connector experiments (in training) |
| + | * Double check the data of existing 3000h data in CVS2 |
| + | * Paper reading (discrete unit, VTS) |
| || | | || |
− | * | + | * Design a experiment to explain the performance of discrete unit |
| + | * Finish data double check |
| + | * Try to establish a simple VTS system based on our VSR system |
| || | | || |
| * | | * |
第121行: |
第127行: |
| |Wan Lin | | |Wan Lin |
| || | | || |
− | * | + | * NS |
| + | ** poster |
| + | ** data preparing and processing |
| + | ** adjust the training code |
| || | | || |
| * | | * |
第132行: |
第141行: |
| |Tianhao Wang | | |Tianhao Wang |
| || | | || |
− | * CLIPSep exps for 2-mix and 5-mix | + | * CLIPSep exps for 2-mix and 5-mix [https://z1et6d3xtb.feishu.cn/docx/DnJgdwtNhotEpIxH7zfcksETnte] |
| ** 2-mix(whole vggsound, 300 classes): SDR-mix = -1.1748, SDR-separate = 5.0145 | | ** 2-mix(whole vggsound, 300 classes): SDR-mix = -1.1748, SDR-separate = 5.0145 |
| ** 5-mix(50 classes of vggsound): SDR-mix = -11.4529, SDR-separate = -0.4764 | | ** 5-mix(50 classes of vggsound): SDR-mix = -11.4529, SDR-separate = -0.4764 |
第170行: |
第179行: |
| |Junhui Chen | | |Junhui Chen |
| || | | || |
− | * | + | * Prepare vb2 data |
| + | ** Too many utterances for training (out of memory), thinking a smart way to divide them. |
| || | | || |
| * | | * |
People |
This Week |
Next Week |
Task Tracking (DeadLine)
|
Dong Wang
|
- AI handbook high-education version, experiment booklet
- Check AI primary school handbook (1-20)
|
|
|
Lantian Li
|
- AI-Graph EN (20/50)
- Prepare CSTR intro report
|
|
|
Ying Shi
|
- Finish Text enroll keywords spotting code & document and deliver to Wei & Du
- Cohort Overlap ASR code v0.0
- code has finished and training has been done
- Cohort Speech separation code v0.0
- code has finished training is in progress
- here
|
|
|
Zhenghai You
|
- Exploring the role of speaker encoder in TSE and generality of SPK-AUG[1]
|
|
|
Junming Yuan
|
- MT-Hubert exp[2]:
- codebook set + infoNCE ---> FC+softmax+CE / FC+sigmoid+BCE
- To reduce the learning rate can work.
- verified the feat-mask MT-Hubert with different lr
- time-mask MT-Hubert verification (in progress)
|
|
|
Chen Chen
|
|
|
|
Xiaolou Li
|
- AV-HuBERT discrete unit training (wer: ↓1.5-3%)
- rethink how to prove the advantage or disadvantage of discrete unit?
- Dense connector experiments (in training)
- Double check the data of existing 3000h data in CVS2
- Paper reading (discrete unit, VTS)
|
- Design a experiment to explain the performance of discrete unit
- Finish data double check
- Try to establish a simple VTS system based on our VSR system
|
|
Zehua Liu
|
- Av-Hubert(Frozen) as Encoder performe very bad(cer:80%)[3]
- after finetune maybe better ,but still bad
- Qwen-14B perform better(47%) than Qwen-7B(50%)
- Finish In-Context-Learning code and is training
- maybe i will get result very soon
|
- verify collected data with XiaoLou
- finish VTS data Acceptance report
|
|
Pengqi Li
|
- Evaluate TAO and LayerCAM(verification) reliability.
- Exploring the Consistency of TAO and LayerCAM Results on different models and datasets.
|
|
|
Wan Lin
|
- NS
- poster
- data preparing and processing
- adjust the training code
|
|
|
Tianhao Wang
|
- CLIPSep exps for 2-mix and 5-mix [4]
- 2-mix(whole vggsound, 300 classes): SDR-mix = -1.1748, SDR-separate = 5.0145
- 5-mix(50 classes of vggsound): SDR-mix = -11.4529, SDR-separate = -0.4764
|
|
|
Xiaoxue Luo
|
- Paper reading about sound separation
- AudioSep reproduction
- Training time is too long -> replace with a small dataset(in training)
|
|
|
Zhenyu Zhou
|
- Model quantization version2
- Multi-talker mix data preparation
|
|
|
Junhui Chen
|
- Prepare vb2 data
- Too many utterances for training (out of memory), thinking a smart way to divide them.
|
|
|
Jiaying Wang
|
|
|
|
Yu Zhang
|
- SocioDojo Llama version
- news integration is adjusted once every 12 hours
- wikipedia & google search is banned
|
|
|
Wenqiang Du
|
- Check the data from past training models and update the KWS model again(Model testing)
- Chinese, Cantonese, Minnan, Haining and Uyghur
|
|
|
Yang Wei
|
- Train text enroll KWS model with updated code (in progress)
|
|
|
Lily
|
|
|
|
Turi
|
- Whisper model finetuning[5]
|
|
|
Yue Gu
|
- revise the TASLP paper
- read several papers about accent and prosody
|
|
|
Qi Qu
|
- AED: classifiers retrained w/ new method (suppression on negative stimuli) and improvement attested.
|
|
|