Predicting Presence and Severity of Depression from Voice with Emotional Transfer Learning

Predicting Presence and Severity of Depression from Voice with Emotional Transfer LearningLasse Hansen16-12-20201

About Me

Lasse Hansen
Master student of Cognitive Science at Aarhus University, Denmark
- Statistics, neuroscience, cognitive psychology, social dynamics
Intern at Data Science 1 since late August 2020
Supervised by Yan-Ping Zhang, Detlef Wolf, and Riccardo Fusaroli (AU)

Interested in bayesian statistics, machine learning

Major Depressive Disorder (MDD)3

MDD

Psychological symptoms

Feeling sad
Loss of interest and energy
Difficulty concentrating

MDD

Psychological symptoms

Physiological symptoms

Feeling sad

Loss of interest and energy

Difficulty concentrating

Fatigue
Stomach aches
Psychomotor retardation

Psychomotor retardation

Slowing of thought and speech
Increased tension in the vocal tract

→ Subtle changes in voice quality

Patients speak slower, more monotone, longer pauses

Diagnosing MDD

Source : https://www.nhs.uk/conditions/stress-anxiety-depression/mood-self-assessment/

Diagnosing MDD

Source : https://www.nhs.uk/conditions/stress-anxiety-depression/mood-self-assessment/

A more objective measure for screening and tracking disease progress would be useful

Detecting Depression from Voice8

Detecting Depression from Voice

Removed audio for data privacy

The Project10

The Project

Use emotion recognition model to predict depression
Controls and depression at 2 visits 6 months
apart
Only those in remission at 6 month follow up

Diagnosis	Gender	N	Hamilton mean	Hamilton SD	Age mean
Visit 1
Controls	f	33	1.6	1.4	32.3
Controls	m	9	1.8	1.1	36.3
Depression	f	31	22.1	3.6	32.0
Depression	m	9	21.8	3.3	34.0
Visit 2
Controls	f	20	1.5	1.8	37.8
Controls	m	5	3.0	1.6	35.0
Depression	f	20	3.8	3.0	29.9
Depression	m	5	4.8	3.7	34.9

Emo model trained on 3 datasets in English and German
Explain the interview content and length
Explain why this is different than other studies (transfer)

The Project

Can we predict depression based on how happy/sad their voice sounds?
Do patients in remission sound like depressed individuals or healthy controls?
Can we predict prognosis based on voice?

Preprocessing pipeline

Noise removal → Speaker diarization → VAD

Preprocessing pipeline

Noise removal → Speaker diarization → VAD

Preprocessing pipeline

Noise removal → Speaker diarization → VAD

Feature extraction

Extract MFCCs each 10 ms.
Summarize in bins of 30 seconds

Results17

Results

Depression vs controls at visit 1

precision (tp / tp + fp) (ie. proportion pred dep actually dep) = 75%

Results

Depression vs controls at visit 1

precision (tp / tp + fp) (ie. proportion pred dep actually dep) = 75%

Results

Effect of preprocessing

Results

Effect of preprocessing

Modelling the difference23

Bayesian T Test (BEST)

Bayesian Estimation Superseedes the T Test (Kruschke, 2012)
Provides complete information on parameters of interest
in the form of posterior distributions
Can accept the null
Handles extreme values better
Easy to incorporate mixed effects

Parameters of interest: mean, diff means, sd, diff sd, effect size
Accept null when certainty is high
t = less sensitive to outliers

Bayesian Inference

"Bayesian inference is just counting"

McElreath, 2020

Assumptions with more with ways that are consistent with data are more plausible
Parameters that are more consistent with data are more plausible

Bayesian Inference

Coin toss: assume fair coin

Bayesian Inference

Coin toss: assume fair coin
Observe 6 tails, 3 heads

Bayesian Inference

Coin toss: assume fair coin
Observe 6 tails, 3 heads
Reallocation of belief
Difficult with many parameters -> sampling
Better view of the uncertainty

BEST Results

Standard tests say no difference in diarization, but we can see there is
Handles missing data
Uncertainty is explicitly modelled
Better estimates (with uncertainty) with less data
Not just yes/no, allows you to keep estimates uncertainty
Better estimation of uncertainty out of sample
"For which predicted prob is the likelihood of the patient being depressed > 80%
Richer representation
Better knowledge of uncertainty than confidence intervals/sd

Take Home Message

A naïve emotion classifier can distinguish patients with
depression and healthy controls reliably above chance level
Around 50% of depressed patients show marked symptoms
in the emotional content of their voice
Patients who enter remission sound similar to controls
Bayesian methods provide a richer representation of your
data and its uncertainty

missing that those not in remission did not come back :(
high P(happy) does not mean not depression, just that voice does not show the phenotype
- People express emotion differently
- MFCC does not capture all of speech
for diarization say how standard tests would say no difference, but with bayes we can see that it exists

Hidden bonus slides

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Tile View: Overview of Slides

Predicting Presence and Severity of Depression from Voice with Emotional Transfer LearningLasse Hansen16-12-20201

About Me

Lasse Hansen
Master student of Cognitive Science at Aarhus University, Denmark
- Statistics, neuroscience, cognitive psychology, social dynamics
Intern at Data Science 1 since late August 2020
Supervised by Yan-Ping Zhang, Detlef Wolf, and Riccardo Fusaroli (AU)

Interested in bayesian statistics, machine learning

Major Depressive Disorder (MDD)3

MDD

Psychological symptoms

Feeling sad
Loss of interest and energy
Difficulty concentrating

MDD

Psychological symptoms

Physiological symptoms

Feeling sad

Loss of interest and energy

Difficulty concentrating

Fatigue
Stomach aches
Psychomotor retardation

Psychomotor retardation

Slowing of thought and speech
Increased tension in the vocal tract

→ Subtle changes in voice quality

Patients speak slower, more monotone, longer pauses

Diagnosing MDD

Source : https://www.nhs.uk/conditions/stress-anxiety-depression/mood-self-assessment/

Diagnosing MDD

Source : https://www.nhs.uk/conditions/stress-anxiety-depression/mood-self-assessment/

A more objective measure for screening and tracking disease progress would be useful

Detecting Depression from Voice8

Detecting Depression from Voice

Removed audio for data privacy

The Project10

The Project

Use emotion recognition model to predict depression
Controls and depression at 2 visits 6 months
apart
Only those in remission at 6 month follow up

Diagnosis	Gender	N	Hamilton mean	Hamilton SD	Age mean
Visit 1
Controls	f	33	1.6	1.4	32.3
Controls	m	9	1.8	1.1	36.3
Depression	f	31	22.1	3.6	32.0
Depression	m	9	21.8	3.3	34.0
Visit 2
Controls	f	20	1.5	1.8	37.8
Controls	m	5	3.0	1.6	35.0
Depression	f	20	3.8	3.0	29.9
Depression	m	5	4.8	3.7	34.9

Emo model trained on 3 datasets in English and German
Explain the interview content and length
Explain why this is different than other studies (transfer)

The Project

Can we predict depression based on how happy/sad their voice sounds?
Do patients in remission sound like depressed individuals or healthy controls?
Can we predict prognosis based on voice?

Preprocessing pipeline

Noise removal → Speaker diarization → VAD

Preprocessing pipeline

Noise removal → Speaker diarization → VAD

Preprocessing pipeline

Noise removal → Speaker diarization → VAD

Feature extraction

Extract MFCCs each 10 ms.
Summarize in bins of 30 seconds

Results17

Results

Depression vs controls at visit 1

precision (tp / tp + fp) (ie. proportion pred dep actually dep) = 75%

Results

Depression vs controls at visit 1

precision (tp / tp + fp) (ie. proportion pred dep actually dep) = 75%

Results

Effect of preprocessing

Results

Effect of preprocessing

Modelling the difference23

Bayesian T Test (BEST)

Bayesian Estimation Superseedes the T Test (Kruschke, 2012)
Provides complete information on parameters of interest
in the form of posterior distributions
Can accept the null
Handles extreme values better
Easy to incorporate mixed effects

Parameters of interest: mean, diff means, sd, diff sd, effect size
Accept null when certainty is high
t = less sensitive to outliers

Bayesian Inference

"Bayesian inference is just counting"

McElreath, 2020

Assumptions with more with ways that are consistent with data are more plausible
Parameters that are more consistent with data are more plausible

Bayesian Inference

Coin toss: assume fair coin

Bayesian Inference

Coin toss: assume fair coin
Observe 6 tails, 3 heads

Bayesian Inference

Coin toss: assume fair coin
Observe 6 tails, 3 heads
Reallocation of belief
Difficult with many parameters -> sampling
Better view of the uncertainty

BEST Results

Standard tests say no difference in diarization, but we can see there is
Handles missing data
Uncertainty is explicitly modelled
Better estimates (with uncertainty) with less data
Not just yes/no, allows you to keep estimates uncertainty
Better estimation of uncertainty out of sample
"For which predicted prob is the likelihood of the patient being depressed > 80%
Richer representation
Better knowledge of uncertainty than confidence intervals/sd

Take Home Message

A naïve emotion classifier can distinguish patients with
depression and healthy controls reliably above chance level
Around 50% of depressed patients show marked symptoms
in the emotional content of their voice
Patients who enter remission sound similar to controls
Bayesian methods provide a richer representation of your
data and its uncertainty

missing that those not in remission did not come back :(
high P(happy) does not mean not depression, just that voice does not show the phenotype
- People express emotion differently
- MFCC does not capture all of speech
for diarization say how standard tests would say no difference, but with bayes we can see that it exists