“ISCSLP Tutorial 2”版本间的差异

2014年9月13日 (六) 05:57的版本

Prof. Chung-Hsien

Arousal & Valence coordinator
separate emotion process to sub emotions

available databases:
database collection:

acted : GEneva multimodeal emotion portrayals (GEMEP)
induced : eNTERFACE'05 EMOTION Database
spontaneous: SEMAINE, AFEW

others: RML,VAM ,FAU AUBO,SAVEE,TUMAVIC,IEMOCAP,SEMAINE MHMC

static vs dynamic modeling

STATIC:

low level descriptors (LLDs) and functionals
good for discriminate high and low-arousal emotions
temporal information is lost, no suitable for long utterances, can not detect change in emotion

DYNAMIC:

frame as the basis, LLDs are extracted and modeled by GMMs, HMMs, DTW
temporal information is obtained
difficult to model context well
a large number of local features need to be extracted,

Unit choice for dynamic modeling

technical unit: frame, time slice, equally-divided unit
meaningful unit: word, syllable, phrases
emotionally consistent unit: emotion profiles, emotograms
different aspects of speech tasks place in different scale

feature concatenation or decision fusion to exploit the information from segmented units

speech features:

prosody feature, pitch, formants, energy, speaking rate, good arosal emotions
ZCR, RMS energy, F0, harmonic noise ratio, MFCC
MFCC
Teager feature is good for detecting streess

recognition models

SVM, ANN, HMM, GMM, CART

Emotion distillation framework

emotion specific features from the original high-dimensional feature
from speech signals, using SVM to generate emotiongrams, and then use HMM, n-gram, LDA, simple sum, give emotion output

Hierarchical classification structure

first detect high/low arosal

Fusion based recognition

Feature level fusion
decision level fusion
Model based fusion: mutli stream HMM

Temporal phase-based modeling

divide the emotion into onset, apex, offset
using HMM to chracterize one emotional sub-state, instead of the entire emotional state
totally 6 states: (onset,apex, offset) X (high, low)
Temporal course modeling

Structure-based modeling

three level units: utterance, emotion units, sub emotion units
use statistic model among different levels

Hsin-Min Wang

Music information retrieval (MIR)

title search
search by query
emotion of songs labelled by persons forms a Gaussian
represent the aoustic features of a song by a probabilistic history vector
acoustic GMM posterior representation as a feature
GMM code book constructed in training (VA GMM)

@@ 第22行： / 第22行： @@
 :* temporal information is obtained
 :* difficult to model context well
-:* a large number of local features need to be extracted
+:* a large number of local features need to be extracted,
 * Unit choice for dynamic modeling
@@ 第28行： / 第28行： @@
 :* meaningful unit: word, syllable, phrases
 :* emotionally consistent unit: emotion profiles, emotograms
+:* different aspects of speech tasks place in different scale
+* feature concatenation or decision fusion to exploit the information from segmented units
+* speech features:
+:* prosody feature, pitch, formants, energy, speaking rate, good arosal emotions
+:* ZCR, RMS energy, F0, harmonic noise ratio, MFCC
+:* MFCC
+:* Teager feature is good for detecting streess
-recognition models
+* recognition models
+:* SVM, ANN, HMM, GMM, CART
+* Emotion distillation framework
+:* emotion specific features from the original high-dimensional feature
+:* from speech signals, using SVM to generate emotiongrams, and then use HMM, n-gram, LDA, simple sum, give emotion output
+* Hierarchical classification structure
+:* first detect high/low arosal
+* Fusion based recognition
+:* Feature level fusion
+:* decision level fusion
+:* Model based fusion: mutli stream HMM
+* Temporal phase-based modeling
+:* divide the emotion into onset, apex, offset
+:* using HMM to chracterize one emotional sub-state, instead of the entire emotional state
+:* totally 6 states: (onset,apex, offset) X (high, low)
+:* Temporal course modeling
+* Structure-based modeling
+:* three level units: utterance, emotion units, sub emotion units
+:* use statistic model among different levels
+Hsin-Min Wang
+* Music information retrieval  (MIR)
+:* title search
+:* search by query
+:* emotion of songs labelled by persons forms a Gaussian
+:* represent the aoustic features of a song by a probabilistic history vector
+:* acoustic GMM posterior representation as a feature
+:* GMM code book constructed in training (VA GMM)

“ISCSLP Tutorial 2”版本间的差异

2014年9月13日 (六) 05:57的版本

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具