“2024-02-05”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(3位用户的7个中间修订版本未显示)
第29行: 第29行:
 
|Ying Shi
 
|Ying Shi
 
||  
 
||  
* INTERSPEECH Paper: Keyword attributed Overlapping ASR
+
 
**  SOTA model training (down)
+
*   INTERSPEECH Paper: Keyword attributed Overlapping ASR
**  SOT model training (down)
+
:        SOTA model training (down)
**  test (in progress)
+
:        SOT model training (down)
* Cohort Overlapping ASR
+
:        test (in progress)
** one fix cohort: 2-mix recognizes ONE WER 8.90%
+
*   Cohort Overlapping ASR
** one fix cohort: 2-mix recognizes TOW WER 9.30%
+
:        one fix cohort: 2-mix recognizes ONE WER 8.90%
** one fix cohort: 3-mix recognize THREE WER 37.83% apply number speaker prior WER 30.58%
+
:        one fix cohort: 2-mix recognizes TOW WER 9.30%
 +
:        one fix cohort: 3-mix recognize THREE WER 37.83% apply number speaker prior WER 30.58%
 +
 
 
||
 
||
 
*  
 
*  
第68行: 第70行:
 
|Chen Chen
 
|Chen Chen
 
||  
 
||  
* DeepFake
+
** by xiaolou,zehua
+
 
** syncnet and wer based experiments on noisy audio/video input
+
*   DeepFake
** seems noise is not the reason why these methods failed
+
:        by xiaolou,zehua
* VTS
+
:        syncnet and wer based experiments on noisy audio/video input
** Finetune a HuBERT with a HiFiGAN for "audio feature to speech" system (both single speaker and multi speaker is ok)
+
:        seems noise is not the reason why these methods failed
** Train a VTS(ResNet Conformer Encoder) for "Video to audio feature" system (for single speaker it works well to some degree)
+
*   VTS
** Try training multi-speaker video-to-audio-feature system
+
:        Finetune a HuBERT with a HiFiGAN for "audio feature to speech" system (both single speaker and multi speaker is ok)
** Try joint train video encoder and hifigan
+
:        Train a VTS(ResNet Conformer Encoder) for "Video to audio feature" system (for single speaker it works well to some degree)
 +
:        Try training multi-speaker video-to-audio-feature system
 +
:        Try joint train video encoder and hifigan
 +
 
 
||
 
||
 
*  
 
*  
第120行: 第125行:
 
|Wan Lin
 
|Wan Lin
 
||  
 
||  
*  
+
* Summarize possible architectures
 +
* Coding & practice
 
||
 
||
 
*
 
*
第186行: 第192行:
 
|Wenqiang Du
 
|Wenqiang Du
 
||  
 
||  
*  
+
* Aibabel
 +
:update CN KWS model
 +
* Diting
 +
:Supplementary testing
 +
:write test report
 
||
 
||
 
*
 
*
第197行: 第207行:
 
|Yang Wei
 
|Yang Wei
 
||  
 
||  
* Prepare data backup for corpus disk.
+
* Prepare data backup for corpus disk.
 
||
 
||
 
*
 
*

2024年2月7日 (三) 05:23的最后版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • Keep on NeuralMag paper, refine the complexity theory
  • Design AI course for Primary School.
Lantian Li
Ying Shi
  • INTERSPEECH Paper: Keyword attributed Overlapping ASR
SOTA model training (down)
SOT model training (down)
test (in progress)
  • Cohort Overlapping ASR
one fix cohort: 2-mix recognizes ONE WER 8.90%
one fix cohort: 2-mix recognizes TOW WER 9.30%
one fix cohort: 3-mix recognize THREE WER 37.83% apply number speaker prior WER 30.58%
Zhenghai You
Junming Yuan
Chen Chen


  • DeepFake
by xiaolou,zehua
syncnet and wer based experiments on noisy audio/video input
seems noise is not the reason why these methods failed
  • VTS
Finetune a HuBERT with a HiFiGAN for "audio feature to speech" system (both single speaker and multi speaker is ok)
Train a VTS(ResNet Conformer Encoder) for "Video to audio feature" system (for single speaker it works well to some degree)
Try training multi-speaker video-to-audio-feature system
Try joint train video encoder and hifigan
Xiaolou Li
Zehua Liu
Pengqi Li
Wan Lin
  • Summarize possible architectures
  • Coding & practice
Tianhao Wang
Zhenyu Zhou
Junhui Chen
Jiaying Wang
Yu Zhang
Wenqiang Du
  • Aibabel
update CN KWS model
  • Diting
Supplementary testing
write test report
Yang Wei
  • Prepare data backup for corpus disk.
Lily