class: center, middle, inverse, title-slide # Predicting Presence and Severity of Depression from Voice with Emotional Transfer Learning ### Lasse Hansen ### 16-12-2020 --- ## About Me .pull-left[ .large[ - Lasse Hansen - Master student of Cognitive Science at Aarhus University, Denmark - Statistics, neuroscience, cognitive psychology, social dynamics - Intern at Data Science 1 since late August 2020 - Supervised by Yan-Ping Zhang, Detlef Wolf, and Riccardo Fusaroli (AU) ] ] .pull_right[ <img src= "pres_img/laha_1217.jpg" align="right" style="margin:0px 80px" width=250> ] ??? - Interested in bayesian statistics, machine learning --- class: inverse, middle, center # Major Depressive Disorder (MDD) --- ## MDD .left-column[ ### Psychological symptoms ] .right-column[ - .large[Feeling sad] - .large[Loss of interest and energy] - .large[Difficulty concentrating] ] --- ## MDD .left-column[ ### Psychological symptoms <br> ### Physiological symptoms ] .right-column[ <ul> <li style="opacity:0.4"><span style="">.large[Feeling sad]</span></li> </ul> <ul> <li style="opacity:0.4"><span style="">.large[Loss of interest and energy]</span></li> </ul> <ul> <li style="opacity:0.4"><span style="">.large[Difficulty concentrating]</span></li> </ul> <br> - .large[Fatigue] - .large[Stomach aches] - .large[.roche_blue[Psychomotor retardation]] ] --- ## Psychomotor retardation .pull-left[ - .large[Slowing of thought and speech] - .large[Increased tension in the vocal tract] <br> .large[ → Subtle changes in .roche_blue[voice quality]] ] .pull-right[ <img src= "pres_img/vocal_tract.jfif" align="right" style="margin:-50px 120px" width=250> ] ??? - Patients speak slower, more monotone, longer pauses --- ## Diagnosing MDD .pull-left[ <img src="pres_img/dep_self.PNG" width="80%" /> Source : https://www.nhs.uk/conditions/stress-anxiety-depression/mood-self-assessment/ ] -- .pull-right[ <img src="pres_img/dep_ham.PNG" width="80%" /> ] ??? - A more objective measure for screening and tracking disease progress would be useful --- class: center, middle, inverse # Detecting Depression from Voice --- ## Detecting Depression from Voice .large[Removed audio for data privacy] --- class: center, middle, inverse # The Project --- ## The Project .pull-left[ .large[ - Use emotion recognition model to predict depression - Controls and depression at 2 visits 6 months<br>apart - .roche_blue[Only those in remission at 6 month follow up] ] ] .pull-right[ <style type="text/css"> .remark-slide thead, .remark-slide tr:nth-child(2n) { background-color: white; } </style> <table class=" lightable-classic" style='font-family: "Arial Narrow", "Source Sans Pro", sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> Diagnosis </th> <th style="text-align:left;"> Gender </th> <th style="text-align:right;"> N </th> <th style="text-align:right;"> Hamilton mean </th> <th style="text-align:right;"> Hamilton SD </th> <th style="text-align:right;"> Age mean </th> </tr> </thead> <tbody> <tr grouplength="4"><td colspan="6" style="border-bottom: 0;"><strong>Visit 1</strong></td></tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Controls </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 33 </td> <td style="text-align:right;"> 1.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 32.3 </td> </tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Controls </td> <td style="text-align:left;"> m </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 1.8 </td> <td style="text-align:right;"> 1.1 </td> <td style="text-align:right;"> 36.3 </td> </tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Depression </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> 22.1 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 32.0 </td> </tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Depression </td> <td style="text-align:left;"> m </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 21.8 </td> <td style="text-align:right;"> 3.3 </td> <td style="text-align:right;"> 34.0 </td> </tr> <tr grouplength="4"><td colspan="6" style="border-bottom: 0;"><strong>Visit 2</strong></td></tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Controls </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 1.8 </td> <td style="text-align:right;"> 37.8 </td> </tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Controls </td> <td style="text-align:left;"> m </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.6 </td> <td style="text-align:right;"> 35.0 </td> </tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Depression </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 3.8 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 29.9 </td> </tr> <tr> <td style="text-align:left; padding-left: 2em;" indentlevel="1"> Depression </td> <td style="text-align:left;"> m </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 4.8 </td> <td style="text-align:right;"> 3.7 </td> <td style="text-align:right;"> 34.9 </td> </tr> </tbody> </table> ] ??? - Emo model trained on 3 datasets in English and German - Explain the interview content and length - Explain why this is different than other studies (transfer) --- ## The Project .pull-left[ .large[ - Can we .roche_blue[predict depression] based on how happy/sad their voice sounds? - Do patients in remission sound like depressed individuals or healthy controls? - Can we predict prognosis based on voice? ] ] --- ## Preprocessing pipeline .center[ .content-box-blue[**Noise removal**] → .content-box-blue[Speaker diarization] → .content-box-blue[VAD]] <br> .pull-left[ <img src="pres_img/spectrogram_raw.png" width="100%" /> ] .pull-right[ <img src="pres_img/spectrogram_clean.png" width="100%" /> ] --- ## Preprocessing pipeline .center[ .content-box-blue[Noise removal] → .content-box-blue[**Speaker diarization**] → .content-box-blue[VAD]] <br> <br> .center[ <img src="pres_img/speak_diarization.PNG" width="60%" /> ] --- ## Preprocessing pipeline .center[ .content-box-blue[Noise removal] → .content-box-blue[Speaker diarization] → .content-box-blue[**VAD**]] <br> <br> .center[ <img src="pres_img/VAD.PNG" width="60%" /> ] --- ## Feature extraction .pull-left[ .large[ - Extract MFCCs each 10 ms. - Summarize in bins of 30 seconds ] ] .pull-right[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-10-1.png" width="504" /> ] --- class: center, middle, inverse # Results --- ## Results .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-11-1.png" width="864" /> ] --- ## Results .center[**Depression vs controls at visit 1**] .pull-left[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-12-1.png" width="576" /> ] ??? precision (tp / tp + fp) (ie. proportion pred dep actually dep) = 75% -- .pull-right[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-13-1.png" width="576" /> ] --- ## Results .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-14-1.png" width="864" /> ] --- ## Results .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-15-1.png" width="720" /> ] --- ## Results .center[**Effect of preprocessing**] .pull-left[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-16-1.png" width="576" /> ] -- .pull-right[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-17-1.png" width="576" /> ] --- class: center, middle, inverse # Modelling the difference --- ## Bayesian T Test (BEST) .large[ - Bayesian Estimation Superseedes the T Test (Kruschke, 2012) - Provides complete information on parameters of interest <br> in the form of posterior distributions - Can accept the null - Handles extreme values better - Easy to incorporate mixed effects ] ??? - Parameters of interest: mean, diff means, sd, diff sd, effect size - Accept null when certainty is high - t = less sensitive to outliers --- ## Bayesian Inference .center[ .content-box-grey[ .large[ "Bayesian inference is just counting" ] .right[McElreath, 2020] ] ] ??? - Assumptions with more with ways that are consistent with data are more plausible - Parameters that are more consistent with data are more plausible --- ## Bayesian Inference .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-18-1.png" width="720" /> ] ??? - Coin toss: assume fair coin --- ## Bayesian Inference .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-19-1.png" width="720" /> ] ??? - Coin toss: assume fair coin - Observe 6 tails, 3 heads --- ## Bayesian Inference .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-20-1.png" width="720" /> ] ??? - Coin toss: assume fair coin - Observe 6 tails, 3 heads - Reallocation of belief - Difficult with many parameters -> sampling - Better view of the uncertainty --- ## BEST Results .center[ <img src="group_pres_v2_files/figure-html/unnamed-chunk-21-1.png" width="720" /> ] ??? - Standard tests say no difference in diarization, but we can see there is - Handles missing data - Uncertainty is explicitly modelled - Better estimates (with uncertainty) with less data - Not just yes/no, allows you to keep estimates uncertainty - Better estimation of uncertainty out of sample - "For which predicted prob is the likelihood of the patient being depressed > 80% - Richer representation - Better knowledge of uncertainty than confidence intervals/sd --- ## Take Home Message .Large[ - A naïve emotion classifier can distinguish patients with <br>depression and healthy controls reliably above chance level - Around 50% of depressed patients show marked symptoms<br>in the emotional content of their voice - Patients who enter remission sound similar to controls - Bayesian methods provide a richer representation of your <br>data and its uncertainty ] --- class: middle, center <img src="pres_img/end.PNG" width="100%" /> ??? - missing that those not in remission did not come back :( - high P(happy) does not mean not depression, just that voice does not show the phenotype - People express emotion differently - MFCC does not capture all of speech - for diarization say how standard tests would say no difference, but with bayes we can see that it exists --- ## Hidden bonus slides <img src="group_pres_v2_files/figure-html/unnamed-chunk-23-1.png" width="576" /> --- ## Hidden bonus slides <img src="group_pres_v2_files/figure-html/unnamed-chunk-24-1.png" width="576" />