Introduction

This paper presented a speech information factorization method based on a novel deep generative model that we called factorial discriminative normalization flow.

Qualitative and quantitative experimental results show that compared to all other models, the proposed factorial DNF can retain the class structure corresponding to multiple information factors, and changing one factor will cause little distortion on other factors. This demonstrates that factorial DNF can well factorize speech signal into different information factors.

Members

Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang

Publications

Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang, "Deep Generative Factorization For Speech Signal", 2020. pdf

Source Code

xxx

Factorial DNF

xxx

Experiments

Data

xx

Encoding

xx

Factor manipulation

Sampling: figs wavs (right click)

                        Phone Manipulation
  Model    |p(q₂|x)|    bap(dim=5)   |   mgc(dim=60) 
   VAE     | 100000 |     130000      |   160000    
   NF    |      130000      |     500000      |   6200000    
DNF  |      60000       |     300000      |   3580000
 f-DNF  |      1:1.3:0.6   |     1:4:2+      |   1:40:20+

Future Work

Test factorial DNF on larger datasets.
Establish general theories for deep generative factorization.

Deep Generative Factorization For Speech Signal(ICASSP21)

目录