“TTS-project-synthesis”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
第18行: 第18行:
 
*Child[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/huilian/child01.neutral/child01-neutral_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 
*Child[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/huilian/child01.neutral/child01-neutral_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
  
==Multi-speaker mix-training without speaker-vector==
+
==Multi-speaker mix-trainingr==
 +
===Without Speaker-vector===
 
*Female & Male[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/female01-male01/female01-male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 
*Female & Male[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/female01-male01/female01-male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
  
第26行: 第27行:
  
  
==Multi-speaker mix-training with speaker-vector==
+
===With speaker-vector===
 
When synthesis, we just replace the speaker-vector for specific person.
 
When synthesis, we just replace the speaker-vector for specific person.
 
*Specific person===
 
*Specific person===
第57行: 第58行:
  
 
::*(11) 1.0:0.0[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/iterpolation/female01_male01/iterpolation_10_female01_male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 
::*(11) 1.0:0.0[http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/multi-speakers/mix/iterpolation/female01_male01/iterpolation_10_female01_male01_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
 +
==Mono-speaker Emotion TTS==
 +
*Specific emotion
 +
:* Neutral emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-neutral_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
:* Happy emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-happy_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
:* Sorrow emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-sorrow_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
:* Angry emotion [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-angry_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
 +
*Interpolation emotion
 +
:* Angry & neutral with different ratio
 +
::*(1) 0.0:1.0 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_0_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(2) 0.1:0.9 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_1_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(3) 0.2:0.8 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_2_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(4) 0.3:0.7 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_3_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(5) 0.4:0.6 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_4_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(6) 0.5:0.5 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_5_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(7) 0.6:0.4 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_6_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(8) 0.7:0.3 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_7_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(9) 0.8:0.2 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_8_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(10) 0.9:0.1 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/mix-emotion-angry-neutral_1_9_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]
 +
::*(11) 1.0:0.0 [http://zhangzy.cslt.org/categories/tts/sample-wav/mimic-wangd-front-end/emotion/roobo.child/x-angry_1_amdurTanh_acTanh_mlpg1_postfilter1.world.wav01.wav]

2017年12月1日 (五) 03:41的版本

Project name

Text To Speech

Project members

Dong Wang, Zhiyong Zhang

Introduction

Text To Speech

Sample waves

Synthesis text:好雨知时节,当春乃发声,随风潜入夜,润物细无声

Mono-speaker TTS

Multi-speaker mix-trainingr

Without Speaker-vector

  • Female & Male[4]
  • Female & Child[5]
  • Male & Child[6]


With speaker-vector

When synthesis, we just replace the speaker-vector for specific person.

  • Specific person===
  • Interpolate the speaker-vector of different person
  • Female & Male with different ratio
  • (1) 0.0:1.0[9]

Mono-speaker Emotion TTS

  • Specific emotion
  • Neutral emotion [20]
  • Happy emotion [21]
  • Sorrow emotion [22]
  • Angry emotion [23]
  • Interpolation emotion
  • Angry & neutral with different ratio