“2024-10-28”版本间的差异
来自cslt Wiki
| (11位用户的14个中间修订版本未显示) | |||
| 第6行: | 第6行: | ||
|Dong Wang | |Dong Wang | ||
|| | || | ||
| − | * | + | * AI primary book done |
|| | || | ||
* | * | ||
| 第17行: | 第17行: | ||
|Lantian Li | |Lantian Li | ||
|| | || | ||
| − | * | + | * AI-Graph EN (1-20 finalized) |
| + | * Design 2025 Daily Posts | ||
|| | || | ||
* | * | ||
| 第28行: | 第29行: | ||
|Ying Shi | |Ying Shi | ||
|| | || | ||
| − | * | + | * revise the code about cohort-overlap asr [the training is in progress] |
| + | ** Support arbitrary source mixing training | ||
| + | ** Use the real hypothesis as condition by Token error rate | ||
| + | ** Design stop criterion | ||
|| | || | ||
* | * | ||
| 第39行: | 第43行: | ||
|Zhenghai You | |Zhenghai You | ||
|| | || | ||
| − | * | + | * Introduce more hard samples to improve model performance[https://z1et6d3xtb.feishu.cn/docx/CURxdy3tEorxkrxtjjqcdMaYnJg] |
| + | ** SPK-AUG with same length: There is an improvement, but the SI-SDR decreases when hard sample rate increases | ||
| + | ** Design more hard samples | ||
|| | || | ||
* | * | ||
| 第72行: | 第78行: | ||
|Xiaolou Li | |Xiaolou Li | ||
|| | || | ||
| − | * | + | * VTS with LLM structure design and baseline code writing [https://z1et6d3xtb.feishu.cn/docx/ZBnOdEMxgo8bs5xrkb1cPZnCnQg?from=from_copylink] |
|| | || | ||
* | * | ||
| 第109行: | 第115行: | ||
|Wan Lin | |Wan Lin | ||
|| | || | ||
| − | * | + | * NS: downsampling is not useful [https://z1et6d3xtb.feishu.cn/docx/MxBNdPbLao0tsoxkBVCcUgUoneh?from=from_copylink] |
| + | * share speaker meeting in Friday | ||
|| | || | ||
* | * | ||
| 第120行: | 第127行: | ||
|Tianhao Wang | |Tianhao Wang | ||
|| | || | ||
| − | * AudioSep (CLAP) 5-mix exps: | + | * AudioSep (CLAP) 5-mix exps[https://z1et6d3xtb.feishu.cn/docx/DlR8dZRdEoZIwIxTOFvcQdbGnqg]: |
** text-query: SDR=4.978, SI-SDR=1.972 | ** text-query: SDR=4.978, SI-SDR=1.972 | ||
** audio-query: SDR=6.907, SI-SDR=5.058 | ** audio-query: SDR=6.907, SI-SDR=5.058 | ||
| 第147行: | 第154行: | ||
|Zhenyu Zhou | |Zhenyu Zhou | ||
|| | || | ||
| − | * | + | *reproduce 5-mix speech Separation results: |
| + | **pit:2-mix:16.04 ;5-mix:6.87 | ||
| + | **conditional:5-mix:5.38(40 epoch) | ||
|| | || | ||
* | * | ||
| 第158行: | 第167行: | ||
|Junhui Chen | |Junhui Chen | ||
|| | || | ||
| − | * | + | * NS:speaker detection (method survey & debug) |
| + | * get sick | ||
|| | || | ||
* | * | ||
| 第205行: | 第215行: | ||
|Yang Wei | |Yang Wei | ||
|| | || | ||
| − | * | + | * Train text enroll KWS model with Aibabel training data. Not work. |
|| | || | ||
* | * | ||
| 第225行: | 第235行: | ||
|Turi | |Turi | ||
|| | || | ||
| − | * | + | * Whisper-largev3 finetuning |
| + | ** Freezing 20 layers of encoder achieved 9.75 WER. Vanilla finetuning 8.02 WER | ||
|| | || | ||
* | * | ||
| 第233行: | 第244行: | ||
|Yue Gu | |Yue Gu | ||
|| | || | ||
| − | * seek | + | * seek suggestions from other authors. Many suggestions are conflicting, so I'm try to figure out the reasons and fix these issues. |
|| | || | ||
* | * | ||
| 第242行: | 第253行: | ||
|Qi Qu | |Qi Qu | ||
|| | || | ||
| − | * | + | * KWS: |
| + | ** Text-enroll models exported to ONNX. | ||
| + | ** C/JNI libs built based on ONNX models and ready for on-device test. | ||
|| | || | ||
* | * | ||
2024年10月28日 (一) 10:59的最后版本
| People | This Week | Next Week | Task Tracking (DeadLine) |
|---|---|---|---|
| Dong Wang |
|
|
|
| Lantian Li |
|
|
|
| Ying Shi |
|
|
|
| Zhenghai You |
|
|
|
| Junming Yuan |
|
|
|
| Chen Chen |
|
|
|
| Xiaolou Li |
|
|
|
| Zehua Liu |
|
|
|
| Pengqi Li |
|
|
|
| Wan Lin |
|
|
|
| Tianhao Wang |
|
|
|
| Xiaoxue Luo |
|
|
|
| Zhenyu Zhou |
|
|
|
| Junhui Chen |
|
|
|
| Jiaying Wang |
|
|
|
| Yu Zhang |
|
|
|
| Wenqiang Du |
|
|
|
| Yang Wei |
|
|
|
| Lily |
|
|
|
| Turi |
|
|
|
| Yue Gu |
|
|
|
| Qi Qu |
|
|
|