2020年3月29日 (日) 04:58的版本

Introduction

Speech signals involve complex factors, each contributing in an unknown and secrete way. Recent developed deep learning methods have built up some interesting tools for discovering these latent factors. These tools include various unsupervised models such as VAE, GAN, supervised learning methods such as multi-task learning, knowledge distillation, etc. These tools allow us to decipher secretes of speech signal, based on big data, rather than hypothesis.

These will lead to an unprecedented breakthrough in speech information processing. Some of the signals for this breakthrough includes:

In speaker recognition, speaker factors can be learned within a very small speech segment.
In speech synthesis, speaking styles can be learned as latent variables and discovered in an unsupervised way, and speaker factors can be used to change the speaker trait.
In speech recognition, learning multiple tasks in a collaborative way has shown to be successful.

In previous studies (Phase 1), we have found that using cascade learning, speech signals can be factorized into content, speaker and emotion at the frame level. In this Phase 2, we will try to answer the following questions:

Can we factorize speech signals in an unsupervised way?
How supervised and unsupervised factorizations are integrated?
How to deal with language discrepancy in factorization?
How to discover optimal factorization architectures?

People

Dong Wang, Yunqi Cai, Haoran Sun, Zhiyuan Tang, Lantian Li

Research direction

Basic research

Collaborative learning with AutoML
VAE/dVAE factorization
Supervised VAE for factorization
ASR + TTS cycle training

Applied reseach

Pretraining for ASR, SID, EMD (BERT in speech)
Low-resource ASR, TTS
Signal compression, cleaning up, etc.

Related publications

Yang Zhang and Lantian Li and Dong Wang, "VAE-based regularization for deep speaker embedding", Interspeech 2019
Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, and Dong Wang, “Deep speaker feature learning for text-independent speaker verification,”, Interspeech 2017.
Lantian Li, Dong Wang, Yixiang Chen, Ying Shing, Zhiyuan Tang, http://wangd.cslt.org/public/pdf/spkfact.pdf
Lantian Li, Zhiyuan Tang, Dong Wang, FULL-INFO TRAINING FOR DEEP SPEAKER FEATURE LEARNING, http://wangd.cslt.org/public/pdf/mlspk.pdf
Zhiyuan Thang, Lantian Li, Dong Wang, Ravi Vipperla "Collaborative Joint Training with Multi-task Recurrent Model for Speech and Speaker Recognition", IEEE Trans. on Audio, Speech and Language Processing, vol. 25, no.3, March 2017.
Dong Wang,Lantian Li,Ying Shi,Yixiang Chen,Zhiyuan Tang., "Deep Factorization for Speech Signal", https://arxiv.org/abs/1706.01777

Talks

Kingma, et al., "Auto-encoding variational Bayes". 2014.
Danilo Jimenez Rezende et al., "Variational Inference with Normalizing Flows", 2016.
Kingma et al., "Improving Variational Inference with Inverse Autoregressive Flow", 2016.
Oord, "Neural Discrete Representation Learning", 2017.
Kingma, "Glow: Generative Flow with Invertible 1×1 Convolutions", 2018.
Goodfellow et al., "Generative adversarial nets", 2014.
Zhu et al., "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks", 2017.
Chen et al., "Infogan: Interpretable representation learning by information maximizing generative adversarial nets", 2016.
Hu et al., "On unifying deep generative models", 2017.
Makhzani, "Adversarial Autoencoders", 2015.
Locatello et al., "Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations", 2019.

ASR

Chan et al., "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition", 2016.
Prabhavalkar et al., "A Comparison of Sequence-to-Sequence Models for Speech Recognition", 2017.
Chiu et al., "State-of-the-art Speech Recognition With Sequence-to-Sequence Models", 2018.
Pratap, "wav2letter++: The Fastest Open-source Speech Recognition System", 2018
Ren et al., "Almost Unsupervised Text to Speech and Automatic Speech Recognition", 2019
Tsai et al., "Learning Factorized Multimodal Representations", 2019

SID

E. Variani, X. Lei, E. McDermott, I. Lopez Moreno, and J. Gonzalez-Dominguez, “Deep neural networks for small footprint text-dependent speaker verification,”2014.
G. Heigold, I. Moreno, S. Bengio, and N. Shazeer, “End-to-end textdependent speaker verification,” 2016.

TTS

Wang, et al., "Tacotron: A fully end-to-end text-to-speech synthesis model." CoRR, abs/1703.10135, 2017.
van den Oord, et al., "Parallel WaveNet: Fast high-fidelity speech synthesis.", CoRR, abs/1711.10433, 2017.
van den Oord, et al., "WaveNet: A generative model for raw audio". CoRR, abs/1609.03499, 2016a
Nal Kalchbrenner et al., "Efficient Neural Audio Synthesis", 2018 (WaveRNN)
Hsu et al., "Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization", NIPS 2018.
Wei-Ning Hsu, DISENTANGLING CORRELATED SPEAKER AND NOISE FOR SPEECH SYNTHESIS VIA DATA AUGMENTATION AND ADVERSARIAL FACTORIZATION, ICASSP 2019.

Flow Model

Overview

Papamakarios, "Normalizing Flows for Probabilistic Modeling and Inference" [9]

History

Whitening:

Richard M. Johnson. The minimal transformation to orthonormality. Psychometrika, 31: 61–66, 03 1966. [10]
Friedman, Exploratory projection pursuit, 1987, [11]

Gaussianization

Chen et al., Gaussianization, NIPS 2001 [12]
Tabak et al., Density estimation by dual ascent of the log-likelihood [13]

Normalization flow

Tabak et al., A family of non-parametric density estimation, [14] (First composition)
Rippel et al., High-dimensional probability estimation with deep density model, [15] (first DNN parameterization)
Laurent Dinh David Krueger Yoshua Bengio,"NICE: NON-LINEAR INDEPENDENT COMPONENTS ESTIMATION", ICLR 2015 [16] (First VP)
Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio, "DENSITY ESTIMATION USING REAL NVP", ICLR 2017 [17] (First NVP)
Danilo Jimenez Rezende et al., Variational Inference with Normalizing Flows, [18] (First apply to variational inference)

VAE Model

2014 Auto-Encoding Variational Bayes, [19] (First composition)
2015 The Variational Fair Autoencoder, [20]
2015 Learning Structured Output Representation using Deep Conditional Generative Models, 2.204795282.388480789.1585412171-912935264.1544959736
2017 Variational Lossy Autoencoder, [21]
2017 Grammar Variational Autoencoder, [22]
2017 beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, [23]
2017 Deep Variational Information Bottleneck, [24]
2017 Understanding disentangling in β-VAE, [25]
2017 Deepcoder: Semi-parametric variational autoencoders for automatic facial action coding, [26]
2017 Hierarchical Variational Autoencoders for Music, [27]
2017 A Classifying Variational Autoencoder with Application to Polyphonic Music Generation, [28]
2018 VAE with a VampPrior, [29]
2018 Hyperspherical Variational Auto-Encoders, [30]
2018 Isolating Sources of Disentanglement in Variational Autoencoders, [31]
2018 Variational Autoencoders Pursue PCA Directions, [32]
2018 Variational Autoencoders for Collaborative Filtering, [33]
2018 Learning Latent Subspaces in Variational Autoencoders, [34]
2019 Structured Disentangled Representations, [35]
2019 Variational Autoencoders with Jointly Optimized Latent Dependency Structure, [36]
2019 Correlated Variational Auto-Encoders, [37]
2019 From Variational to Deterministic Autoencoders, [38]
2019 Variational Laplace Autoencoders, [39]
2019 Disentangling Disentanglement in Variational Autoencoders, [40]
2019 Modeling assumptions and evaluation schemes: On the assessment of deep latent variable models, [41]
2019 Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders, [42]
2019 A Joint Generative Model for Zero-Shot Learning, [43]
2019 A Contrastive Divergence for Combining Variational Inference and MCMC, [44]
2019 Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders, [45]
2019 Hierarchical Decompositional Mixtures of Variational Autoencoders, [46]
2019 Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs, [47]
2019 InfoVAE: Balancing Learning and Inference in Variational Autoencoders, [48]
2019 Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing, [49]
2019 Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data, [50]
2019 Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse, [51]
2019 Learning Hierarchical Priors in VAEs, [52]
2020 Mixed-curvature Variational Autoencoders, S1g6xeSKDS
2020 Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities, B1lj20NFDS
2020 Mitigating Posterior Collapse in Strongly Conditioned Variational Autoencoders, rJlHea4Kvr&name original_pdf
2020 When Do Variational Autoencoders Know What They Don't Know?, Skg7VAEKDS

Tools

VAE: https://jmetzen.github.io/2015-11-27/vae.html
WaveRNN + VQVAE: https://github.com/mkotha/WaveRNN

@@ 第134行： / 第134行： @@
 ===VAE Model===
-:* Auto-Encoding Variational Bayes, [https://arxiv.org/abs/1312.6114] (First composition)
+:* 2014 Auto-Encoding Variational Bayes, [https://arxiv.org/abs/1312.6114]  (First composition)
-:* Auto-Encoding Variational Bayes,[https://arxiv.org/abs/1312.6114]
+:* 2015 The Variational Fair Autoencoder, [https://arxiv.org/abs/1511.00830]
-:* Variational Lossy Autoencoder,[https://arxiv.org/abs/1611.02731]
+:* 2015 Learning Structured Output Representation using Deep Conditional Generative Models, [https://pdfs.semanticscholar.org/3f25/e17eb717e5894e0404ea634451332f85d287.pdf?_ga 2.204795282.388480789.1585412171-912935264.1544959736]
-:* VAE with a VampPrior,[https://arxiv.org/abs/1705.07120]
+:* 2017 Variational Lossy Autoencoder, [https://arxiv.org/abs/1611.02731]
-:* The Variational Fair Autoencoder,[https://arxiv.org/abs/1511.00830]
+:* 2017 Grammar Variational Autoencoder, [https://arxiv.org/abs/1703.01925]
-:* Grammar Variational Autoencoder,[https://arxiv.org/abs/1703.01925]
+:* 2017 beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, [https://www.semanticscholar.org/paper/beta-VAE%3A-Learning-Basic-Visual-Concepts-with-a-Higgins-Matthey/a90226c41b79f8b06007609f39f82757073641e2]
-:* beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework,[https://www.semanticscholar.org/paper/beta-VAE%3A-Learning-Basic-Visual-Concepts-with-a-Higgins-Matthey/a90226c41b79f8b06007609f39f82757073641e2]
+:* 2017 Deep Variational Information Bottleneck, [https://arxiv.org/abs/1612.00410]
-:* Deep Variational Information Bottleneck,[https://arxiv.org/abs/1612.00410]
+:* 2017 Understanding disentangling in β-VAE, [https://arxiv.org/abs/1804.03599]
-:* Hyperspherical Variational Auto-Encoders,[http://auai.org/uai2018/proceedings/papers/309.pdf]
+:* 2017 Deepcoder: Semi-parametric variational autoencoders for automatic facial action coding, [http://openaccess.thecvf.com/content_ICCV_2017/papers/Tran_DeepCoder_Semi-Parametric_Variational_ICCV_2017_paper.pdf]
-:* Understanding disentangling in β-VAE,[https://arxiv.org/abs/1804.03599]
+:* 2017 Hierarchical Variational Autoencoders for Music, [https://nips2017creativity.github.io/doc/Hierarchical_Variational_Autoencoders_for_Music.pdf]
-:* Isolating Sources of Disentanglement in Variational Autoencoders,[https://arxiv.org/pdf/1802.04942v2.pdf]
+:* 2017 A Classifying Variational Autoencoder with Application to Polyphonic Music Generation, [https://arxiv.org/pdf/1711.07050.pdf]
-:* Structured Disentangled Representations,[http://proceedings.mlr.press/v89/esmaeili19a.html]
+:* 2018 VAE with a VampPrior, [https://arxiv.org/abs/1705.07120]
-:* Variational Autoencoders with Jointly Optimized Latent Dependency Structure,[https://www.cs.sfu.ca/~mori/research/papers/he-iclr19.pdf]
+:* 2018 Hyperspherical Variational Auto-Encoders, [http://auai.org/uai2018/proceedings/papers/309.pdf]
-:* Variational Autoencoders Pursue PCA Directions,[http://openaccess.thecvf.com/content_CVPR_2019/papers/Rolinek_Variational_Autoencoders_Pursue_PCA_Directions_by_Accident_CVPR_2019_paper.pdf]
+:* 2018 Isolating Sources of Disentanglement in Variational Autoencoders, [https://arxiv.org/pdf/1802.04942v2.pdf]
-:* Correlated Variational Auto-Encoders,[http://proceedings.mlr.press/v97/tang19b/tang19b.pdf]
+:* 2018 Variational Autoencoders Pursue PCA Directions, [http://openaccess.thecvf.com/content_CVPR_2019/papers/Rolinek_Variational_Autoencoders_Pursue_PCA_Directions_by_Accident_CVPR_2019_paper.pdf]
-:* Mixed-curvature Variational Autoencoders,[https://openreview.net/pdf?id=S1g6xeSKDS]
+:* 2018 Variational Autoencoders for Collaborative Filtering, [https://dl.acm.org/doi/10.1145/3178876.3186150]
-:* From Variational to Deterministic Autoencoders,[https://arxiv.org/pdf/1903.12436v1.pdf]
+:* 2018 Learning Latent Subspaces in Variational Autoencoders, [http://www.cs.toronto.edu/~zemel/documents/Conditional_Subspace_VAE_all.pdf]
-:* Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities,[https://openreview.net/pdf?id=B1lj20NFDS]
+:* 2019 Structured Disentangled Representations, [http://proceedings.mlr.press/v89/esmaeili19a.html]
-:* Variational Laplace Autoencoders,[http://proceedings.mlr.press/v97/park19a.html]
+:* 2019 Variational Autoencoders with Jointly Optimized Latent Dependency Structure, [https://www.cs.sfu.ca/~mori/research/papers/he-iclr19.pdf]
-:* Disentangling Disentanglement in Variational Autoencoders,[http://proceedings.mlr.press/v97/mathieu19a.html]
+:* 2019 Correlated Variational Auto-Encoders, [http://proceedings.mlr.press/v97/tang19b/tang19b.pdf]
-:* Modeling assumptions and evaluation schemes: On the assessment of deep latent variable models,[http://openaccess.thecvf.com/content_CVPRW_2019/papers/Uncertainty%20and%20Robustness%20in%20Deep%20Visual%20Learning/Butepage_Modeling_assumptions_and_evaluation_schemes_On_the_assessment_of_deep_CVPRW_2019_paper.pdf]
+:* 2019 From Variational to Deterministic Autoencoders, [https://arxiv.org/pdf/1903.12436v1.pdf]
-:* Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders,[http://openaccess.thecvf.com/content_CVPR_2019/html/Schonfeld_Generalized_Zero-_and_Few-Shot_Learning_via_Aligned_Variational_Autoencoders_CVPR_2019_paper.html]
+:* 2019 Variational Laplace Autoencoders, [http://proceedings.mlr.press/v97/park19a.html]
-:* A Joint Generative Model for Zero-Shot Learning,[https://link.springer.com/chapter/10.1007%2F978-3-030-11018-5_50]
+:* 2019 Disentangling Disentanglement in Variational Autoencoders, [http://proceedings.mlr.press/v97/mathieu19a.html]
-:* A Contrastive Divergence for Combining Variational Inference and MCMC,[http://proceedings.mlr.press/v97/ruiz19a.html]
+:* 2019 Modeling assumptions and evaluation schemes: On the assessment of deep latent variable models, [http://openaccess.thecvf.com/content_CVPRW_2019/papers/Uncertainty%20and%20Robustness%20in%20Deep%20Visual%20Learning/Butepage_Modeling_assumptions_and_evaluation_schemes_On_the_assessment_of_deep_CVPRW_2019_paper.pdf]
-:* Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders,[https://aaai.org/ojs/index.php/AAAI/article/view/3828]
+:* 2019 Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders, [http://openaccess.thecvf.com/content_CVPR_2019/html/Schonfeld_Generalized_Zero-_and_Few-Shot_Learning_via_Aligned_Variational_Autoencoders_CVPR_2019_paper.html]
-:* Hierarchical Decompositional Mixtures of Variational Autoencoders,[http://proceedings.mlr.press/v97/tan19b.html]
+:* 2019 A Joint Generative Model for Zero-Shot Learning, [https://link.springer.com/chapter/10.1007%2F978-3-030-11018-5_50]
-:* Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs,[http://proceedings.mlr.press/v97/balaji19a.html]
+:* 2019 A Contrastive Divergence for Combining Variational Inference and MCMC, [http://proceedings.mlr.press/v97/ruiz19a.html]
-:* InfoVAE: Balancing Learning and Inference in Variational Autoencoders,[https://aimagazine.org/ojs/index.php/AAAI/article/view/4538]
+:* 2019 Resisting Adversarial Attacks Using Gaussian Mixture Variational Autoencoders, [https://aaai.org/ojs/index.php/AAAI/article/view/3828]
-:* Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing,[https://aclweb.org/anthology/papers/N/N19/N19-1021/]
+:* 2019 Hierarchical Decompositional Mixtures of Variational Autoencoders, [http://proceedings.mlr.press/v97/tan19b.html]
-:* Variational Autoencoders for Collaborative Filtering,[https://dl.acm.org/doi/10.1145/3178876.3186150]
+:* 2019 Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs, [http://proceedings.mlr.press/v97/balaji19a.html]
-:* Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data,[http://proceedings.mlr.press/v97/antelmi19a.html]
+:* 2019 InfoVAE: Balancing Learning and Inference in Variational Autoencoders, [https://aimagazine.org/ojs/index.php/AAAI/article/view/4538]
-:* Deepcoder: Semi-parametric variational autoencoders for automatic facial action coding,[http://openaccess.thecvf.com/content_ICCV_2017/papers/Tran_DeepCoder_Semi-Parametric_Variational_ICCV_2017_paper.pdf]
+:* 2019 Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing, [https://aclweb.org/anthology/papers/N/N19/N19-1021/]
-:* Mitigating Posterior Collapse in Strongly Conditioned Variational Autoencoders,[https://openreview.net/attachment?id=rJlHea4Kvr&name=original_pdf]
+:* 2019 Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data, [http://proceedings.mlr.press/v97/antelmi19a.html]
-:* When Do Variational Autoencoders Know What They Don't Know?,[https://openreview.net/pdf?id=Skg7VAEKDS]
+:* 2019 Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse, [https://www.bibsonomy.org/bibtex/4700607448c23d9abeb9cb387c852f93]
-:* Learning Latent Subspaces in Variational Autoencoders,[http://www.cs.toronto.edu/~zemel/documents/Conditional_Subspace_VAE_all.pdf]
+:* 2019 Learning Hierarchical Priors in VAEs, [https://www.semanticscholar.org/paper/Learning-Hierarchical-Priors-in-VAEs-Klushyn-Chen/6bb61321d45960f18098568f54eeb1ce413b7abe]
-:* Hierarchical Variational Autoencoders for Music,[https://nips2017creativity.github.io/doc/Hierarchical_Variational_Autoencoders_for_Music.pdf]
+:* 2020 Mixed-curvature Variational Autoencoders, [https://openreview.net/pdf?id S1g6xeSKDS]
-:* Learning Structured Output Representation using Deep Conditional Generative Models,[https://pdfs.semanticscholar.org/3f25/e17eb717e5894e0404ea634451332f85d287.pdf?_ga=2.204795282.388480789.1585412171-912935264.1544959736]
+:* 2020 Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities, [https://openreview.net/pdf?id B1lj20NFDS]
-:* A Classifying Variational Autoencoder with Application to Polyphonic Music Generation,[https://arxiv.org/pdf/1711.07050.pdf]
+:* 2020 Mitigating Posterior Collapse in Strongly Conditioned Variational Autoencoders, [https://openreview.net/attachment?id rJlHea4Kvr&name original_pdf]
-:* Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse,[https://www.bibsonomy.org/bibtex/4700607448c23d9abeb9cb387c852f93]
+:* 2020 When Do Variational Autoencoders Know What They Don't Know?, [https://openreview.net/pdf?id Skg7VAEKDS]
-:* Learning Hierarchical Priors in VAEs,[https://www.semanticscholar.org/paper/Learning-Hierarchical-Priors-in-VAEs-Klushyn-Chen/6bb61321d45960f18098568f54eeb1ce413b7abe]

“Deep Speech Factorization-2”版本间的差异

2020年3月29日 (日) 04:58的版本

目录

Introduction

People

Research direction

Basic research

Applied reseach

Related publications

Talks

Further reading

Old

Linguistics

ML

ASR

SID

TTS

Flow Model

VAE Model

Tools

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具