“2024-11-11”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(3位用户的4个中间修订版本未显示)
第90行: 第90行:
 
|Pengqi Li
 
|Pengqi Li
 
||
 
||
*
+
* Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%).
 +
** Conclusions still need to be further analyzed in conjunction with other databases.[https://z1et6d3xtb.feishu.cn/docx/VtlIdFxdRodp8Nx8oQjcVLC4nCd]
 
||
 
||
 
*
 
*
第101行: 第102行:
 
|Wan Lin
 
|Wan Lin
 
||
 
||
*
+
* NS: detection
 +
** clean: 1.479% EER vs. 1.239% EER
 +
** multi: in training
 
||
 
||
 
*
 
*
第147行: 第150行:
 
|Junhui Chen
 
|Junhui Chen
 
||
 
||
*
+
* VAD frame level detection loss
 +
** Loss decreases faster in the early stages of training
 +
* Change test encoder: from resnet34 to transformer encoder (coding...)
 
||
 
||
 
*
 
*
第234行: 第239行:
 
|Qi Qu
 
|Qi Qu
 
||
 
||
*  
+
* KWS:
 +
** Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed.
 +
** Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2.
 +
** NPU demo: feature extraction + model inference.
 +
** Text-enroll method: android demo benchmark.
 
||
 
||
 
*
 
*

2024年11月11日 (一) 11:05的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • Tianjian AI book (done)
Lantian Li
  • Complete all the script for the 2025 AI calendar
  • AI-Graph EN (32/50)
Ying Shi
Zhenghai You
  • Huawei project with IRA-TSE[1]
Junming Yuan
  • re-check some details from Cocktail HuBERT paper and prepared the code.
    • pseudo-label preparation finished.
  • paper reading
Xiaolou Li
  • Finish VTS documents with Zehua
  • Process the CVS3 data
  • Inherit the AV-HuBERT training code and debug
Zehua Liu
  • Finish 2 VTS documents with Xiaolou
    • Financial Document
    • Technical Document
  • Paper Reading on last Friday
Pengqi Li
  • Analyze the distribution of phoneme importance(PID) in the TIMIT dataset based on more SOTA models(TDNN 4.4% , ECAPA:2.8%).
    • Conclusions still need to be further analyzed in conjunction with other databases.[2]
Wan Lin
  • NS: detection
    • clean: 1.479% EER vs. 1.239% EER
    • multi: in training
Tianhao Wang
  • ablation study about some new approach for sound separation [3]
Xiaoxue Luo
  • paper reading to investigate some new approach for sound separation
  • retrain AudioSep with a DPRNN block(AudioSep-DP)
Zhenyu Zhou
  • Attemp to add silence loss during training(seems like useless)
  • Conditional Chain 2-5 mix results(still some bugs,the acc of speaker number is poor)[4]
Junhui Chen
  • VAD frame level detection loss
    • Loss decreases faster in the early stages of training
  • Change test encoder: from resnet34 to transformer encoder (coding...)
Jiaying Wang
Yu Zhang
  • SocioDojo
    • Single stock (TSLA) investment (still running)
  • Investigate some Text guided LLM centric time-series forecaster and reproduce some of them (Time-LLM LLM-Process, AutoTimes), and some toy experiment about how prompt prefix influence the forecast result
Wenqiang Du
  • Training of New language Models(Cantonese)
  • Prepare the PPT for the competition
Yang Wei
  • Train text enroll KWS model with 7000h data
Lily
Turi
  • kws data preparation and checking some implementations
  • Paper Reading about kws
Yue Gu
  • use CosyVoice model to synthesize the target speaker utterance, which is employed as the supplement for target speaker adaptation. The adaptation exp is running.
  • icassp 2025 paper review
  • paper writing
Qi Qu
  • KWS:
    • Yi (Liangshan, Sichuan) test dataset annotated and finalized. Optimal thresholds for predefined scenes. Cloud model service deployed.
    • Quantization for NPU with more calibration data (6k): mean_loss=1.3e-4, max_loss=6.2e-2.
    • NPU demo: feature extraction + model inference.
    • Text-enroll method: android demo benchmark.