“2024-02-05”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第29行: 第29行:
 
|Ying Shi
 
|Ying Shi
 
||  
 
||  
*
+
 
 +
*   INTERSPEECH Paper: Keyword attributed Overlapping ASR
 +
:        SOTA model training (down)
 +
:        SOT model training (down)
 +
:        test (in progress)
 +
*    Cohort Overlapping ASR
 +
:        one fix cohort: 2-mix recognizes ONE WER 8.90%
 +
:        one fix cohort: 2-mix recognizes TOW WER 9.30%
 +
:        one fix cohort: 3-mix recognize THREE WER 37.83% apply number speaker prior WER 30.58%
 +
 
 
||
 
||
 
*  
 
*  

2024年2月5日 (一) 11:19的版本

People This Week Next Week Task Tracking (DeadLine)
Dong Wang
  • Keep on NeuralMag paper, refine the complexity theory
  • Design AI course for Primary School.
Lantian Li
Ying Shi
  • INTERSPEECH Paper: Keyword attributed Overlapping ASR
SOTA model training (down)
SOT model training (down)
test (in progress)
  • Cohort Overlapping ASR
one fix cohort: 2-mix recognizes ONE WER 8.90%
one fix cohort: 2-mix recognizes TOW WER 9.30%
one fix cohort: 3-mix recognize THREE WER 37.83% apply number speaker prior WER 30.58%
Zhenghai You
Junming Yuan
Chen Chen


  • DeepFake
by xiaolou,zehua
syncnet and wer based experiments on noisy audio/video input
seems noise is not the reason why these methods failed
  • VTS
Finetune a HuBERT with a HiFiGAN for "audio feature to speech" system (both single speaker and multi speaker is ok)
Train a VTS(ResNet Conformer Encoder) for "Video to audio feature" system (for single speaker it works well to some degree)
Try training multi-speaker video-to-audio-feature system
Try joint train video encoder and hifigan
Xiaolou Li
Zehua Liu
Pengqi Li
Wan Lin
Tianhao Wang
Zhenyu Zhou
Junhui Chen
Jiaying Wang
Yu Zhang
Wenqiang Du
Yang Wei
  • Prepare data backup for corpus disk.
Lily