|
|
(3位用户的4个中间修订版本未显示) |
第90行: |
第90行: |
| |Pengqi Li | | |Pengqi Li |
| || | | || |
− | * | + | * Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%). |
| + | ** Conclusions still need to be further analyzed in conjunction with other databases.[https://z1et6d3xtb.feishu.cn/docx/VtlIdFxdRodp8Nx8oQjcVLC4nCd] |
| || | | || |
| * | | * |
第101行: |
第102行: |
| |Wan Lin | | |Wan Lin |
| || | | || |
− | * | + | * NS: detection |
| + | ** clean: 1.479% EER vs. 1.239% EER |
| + | ** multi: in training |
| || | | || |
| * | | * |
第147行: |
第150行: |
| |Junhui Chen | | |Junhui Chen |
| || | | || |
− | * | + | * VAD frame level detection loss |
| + | ** Loss decreases faster in the early stages of training |
| + | * Change test encoder: from resnet34 to transformer encoder (coding...) |
| || | | || |
| * | | * |
第234行: |
第239行: |
| |Qi Qu | | |Qi Qu |
| || | | || |
− | * | + | * KWS: |
| + | ** Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed. |
| + | ** Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2. |
| + | ** NPU demo: feature extraction + model inference. |
| + | ** Text-enroll method: android demo benchmark. |
| || | | || |
| * | | * |
People |
This Week |
Next Week |
Task Tracking (DeadLine)
|
Dong Wang
|
|
|
|
Lantian Li
|
- Complete all the script for the 2025 AI calendar
- AI-Graph EN (32/50)
|
|
|
Ying Shi
|
|
|
|
Zhenghai You
|
- Huawei project with IRA-TSE[1]
|
|
|
Junming Yuan
|
- re-check some details from Cocktail HuBERT paper and prepared the code.
- pseudo-label preparation finished.
- paper reading
|
|
|
Xiaolou Li
|
- Finish VTS documents with Zehua
- Process the CVS3 data
- Inherit the AV-HuBERT training code and debug
|
|
|
Zehua Liu
|
- Finish 2 VTS documents with Xiaolou
- Financial Document
- Technical Document
- Paper Reading on last Friday
|
|
|
Pengqi Li
|
- Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%).
- Conclusions still need to be further analyzed in conjunction with other databases.[2]
|
|
|
Wan Lin
|
- NS: detection
- clean: 1.479% EER vs. 1.239% EER
- multi: in training
|
|
|
Tianhao Wang
|
- ablation study about some new approach for sound separation [3]
|
|
|
Xiaoxue Luo
|
- paper reading to investigate some new approach for sound separation
- retrain AudioSep with a DPRNN block(AudioSep-DP)
|
|
|
Zhenyu Zhou
|
- Attemp to add silence loss during training(seems like useless)
- Conditional Chain 2-5 mix results(still some bugs,the acc of speaker number is poor)[4]
|
|
|
Junhui Chen
|
- VAD frame level detection loss
- Loss decreases faster in the early stages of training
- Change test encoder: from resnet34 to transformer encoder (coding...)
|
|
|
Jiaying Wang
|
|
|
|
Yu Zhang
|
- SocioDojo
- Single stock (TSLA) investment (still running)
- Investigate some Text guided LLM centric time-series forecaster and reproduce some of them (Time-LLM LLM-Process, AutoTimes), and some toy experiment about how prompt prefix influence the forecast result
|
|
|
Wenqiang Du
|
- Training of New language Models(Cantonese)
- Prepare the PPT for the competition
|
|
|
Yang Wei
|
- Train text enroll KWS model with 7000h data
|
|
|
Lily
|
|
|
|
Turi
|
- kws data preparation and checking some implementations
- Paper Reading about kws
|
|
|
Yue Gu
|
- use CosyVoice model to synthesize the target speaker utterance, which is employed as the supplement for target speaker adaptation. The adaptation exp is running.
- icassp 2025 paper review
- paper writing
|
|
|
Qi Qu
|
- KWS:
- Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed.
- Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2.
- NPU demo: feature extraction + model inference.
- Text-enroll method: android demo benchmark.
|
|
|