Flow-based Speech Analysis

来自cslt Wiki

2019年10月29日 (二) 12:57Sunhaoran（讨论 | 贡献）的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

跳转至：导航、搜索

目录

[隐藏]

1 Introduction
2 Members
3 Publications
4 Source Code
5 Experiments
6 Future Work

Introduction

We present a preliminary investigation on unsupervised speech factorization based on the normalization flow model.

This model constructs a complex invertible transform, by which we can project speech segments into a latent code space where the distribution is a simple diagonal Gaussian.

Our preliminary investigation on the TIMIT database shows that this code space exhibits favorable properties such as denseness and pseudo linearity,

and perceptually important factors such as phonetic content and speaker trait can be represented as particular directions within the code space.

Index Terms: speech factorization, normalization flow, deep learning

Members

Dong Wang, Haoran Sun, Yunqi Cai, Lantian Li

Publications

Haoran Sun, Yunqi Cai, Lantian Li, Dong Wang, "On Investigation of Unsupervised Speech Factorization Based in Normalization Flow", 2019. pdf

Source Code

Glow model implemented by Yuki-Chai in PyTorch. glow-pytorch

Experiments

Right click "figs" or "wavs" and select "save as" to save the spectrogram figures or related audio files.

Sampling: figs wavs (right click)

Interpolation: figs wavs (right click)

Denoising: figs wavs (right click)

Future Work

To conduct more thorough studies on large databases and continuous speech.
To investigate discriminative flow models which take class information into consideration.

取自“http://index.cslt.org/mediawiki/index.php?title=Flow-based_Speech_Analysis&oldid=34229”