南方医科大学学报 ›› 2025, Vol. 45 ›› Issue (4): 711-717.doi: 10.12122/j.issn.1673-4254.2025.04.05

• • 上一篇    

基于多种机器学习算法和语音情绪特征的阈下抑郁辨识模型构建

陈梅妹1,2(), 王洋1,2, 雷黄伟1,2, 张斐1,2, 黄睿娜1,2, 杨朝阳1,2()   

  1. 1.福建中医药大学中医学院,福建 福州 350122
    2.福建省中医健康状态辨识重点实验室,福建 福州 350122
  • 收稿日期:2024-12-10 出版日期:2025-04-20 发布日期:2025-04-28
  • 通讯作者: 杨朝阳 E-mail:chenmeimei1984@163.com;yzy813@126.com
  • 作者简介:陈梅妹,博士,副研究员,硕士生导师,E-mail: chenmeimei1984@163.com
  • 基金资助:
    福建省自然科学基金(2022J01361);福建中医药大学基础学科提升项目(XJC2023004)

Construction of recognition models for subthreshold depression based on multiple machine learning algorithms and vocal emotional characteristics

Meimei CHEN1,2(), Yang WANG1,2, Huangwei LEI1,2, Fei ZHANG1,2, Ruina HUANG1,2, Zhaoyang YANG1,2()   

  1. 1.College of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China
    2.Fujian Key Laboratory of Health Status Identification of Traditional Chinese Medicine, Fuzhou 350122, China
  • Received:2024-12-10 Online:2025-04-20 Published:2025-04-28
  • Contact: Zhaoyang YANG E-mail:chenmeimei1984@163.com;yzy813@126.com

摘要:

目的 分析阈下抑郁组和正常组的语音情绪特征,并通过6种机器学习算法构建语音识别分类模型,为阈下抑郁辨识提供客观化依据,以提高早期诊断率。 方法 采集正常组和阈下抑郁组的朗读单词和文本的不同语音数据,每个语音段提取384维语音情绪特征变量,包括能量特征、梅尔频率倒谱系数、零交叉率特征、声音概率特征、基频特征、差分特征等多个维度。采用递归特征消除方法筛选语音特征变量,然后利用自适应增强算法(AdaBoost)、随机森林(RF)、线性判别分析(LDA)、逻辑回归、Lasso回归和支持向量机机器学习算法构建分类模型,并评估模型的性能。为评估模型泛化能力,采用真实世界的语音数据,对最佳阈下抑郁语音识别分类模型进行测试。 结果 AdaBoost、RF和LDA模型在单词朗读语音测试集上预测准确率为100%、100%和93.3%,展现出高准确率和稳定性;在单词文本语音测试集上,AdaBoost、RF和LDA模型的预测准确率为90%、80%和90%,其余3个算法模型的准确率均小于80%。阈下抑郁语音AdaBoost和RF分类模型对真实世界的朗读单词和文本语音数据的预测准确率仍然可以达到了91.7%和80.6%,86.1%和77.8%。 结论 通过分析语音情绪特征可以有效地识别阈下抑郁个体,AdaBoost和RF模型在阈下抑郁个体分类方面表现出色,是识别阈下抑郁的有力工具,可以为临床应用和研究提供参考。

关键词: 阈下抑郁识别, 语音情绪特征, 机器学习, 自适应增强算法, 随机森林

Abstract:

Objective To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression. Methods We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model. Results The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively). Conclusion Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.

Key words: subthreshold depression recognition, phonological and emotional characteristics, machine learning, AdaBoost, random forest