“Flow-based Speech Analysis”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Flow-based Speech Analysis
Flow-based Speech Analysis
第12行: 第12行:
 
===Experimental Results===
 
===Experimental Results===
 
* Sample: [[媒体文件:Figs3_flow.rar|figs]] [[媒体文件:Wav_fig3.rar|wavs]]
 
* Sample: [[媒体文件:Figs3_flow.rar|figs]] [[媒体文件:Wav_fig3.rar|wavs]]
 
+
<a href="http://127.0.0.1/001.mp3" download="001.mp3">点击下载</a>
 
[[文件:Fig3_flow.jpg|1000px]]
 
[[文件:Fig3_flow.jpg|1000px]]

2019年10月29日 (二) 05:36的版本

Flow-based Speech Analysis

  • Members: Dong Wang, Haoran Sun, Yunqi Cai, Lantian Li
  • Paper: Haoran Sun, Yunqi Cai, Lantian Li, Dong Wang, "On Investigation of Unsupervised Speech Factorization Based in Normalization Flow", 2019. link
  • original codes we used: the pytorch version of glow model by Yuki-Chai. glow-pytorch

Introduction

  • We present a preliminary investigation on unsupervised speech factorization based on the normalization flow model. This model constructs a complex invertible transform, by which we can project speech segments into a latent code space where the distribution is a simple diagonal Gaussian.
  • Our preliminary investigation on the TIMIT database shows that this code space exhibits favorable properties such as denseness and pseudo linearity, and perceptually important factors such as phonetic content and speaker trait can be represented as particular directions within the code space.
  • Index Terms: speech factorization, normalization flow, deep learning

Experimental Results

<a href="http://127.0.0.1/001.mp3" download="001.mp3">点击下载</a> Fig3 flow.jpg