Research

All

2025

Computer audition for healthcare: A survey on speech analysis

Kun Qian, Zhonghao Zhao, Yang Tan, Weijia Zhang, MinKi Cho, Cuiping Zhu, Fuze Tian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller

AI Open · 28 Oct 2025 · doi:10.1016/j.aiopen.2025.10.001

Intelligent speech analysis (ISA) constitutes a significant component within the realm of computer audition (CA) technology. Speech, as a fundamental tool for human communication, not only conveys rich semantic information but also holds significant potential for various healthcare applications. Computational paralinguistics methods can be used to analyse alterations in the acoustic characteristics of speech signals induced by medical conditions, providing valuable insights into shifts in an individual’s health status. More importantly, compared to other physiological monitoring devices, speech acquisition devices are non-invasive and user-friendly, making them accessible for a wide range of individuals. However, despite its promise, ISA in healthcare currently faces a range of notable challenges that hinder its widespread adoption. In this survey, we present an overview of the development and current research in speech analysis technologies within the healthcare domain. First, we summarise the methodologies employed in ISA-based healthcare. Next, we provide an overview of applications in evaluating physical diseases, mental health conditions, and neurological disorders. Additionally, we discuss key limitations and shortcomings in the current state of the field. Finally, we conclude with a summary of the discussed works and offer insights into future research directions aimed at addressing these limitations to advance the practical implementation of ISA in clinical settings. This survey aims to serve as a valuable resource for researchers in speech analysis, biomedicine, and related fields. We hope to inspire greater interest in this promising area within the scientific community and provide guidance for future studies in this evolving field.

FedKDC: Consensus-Driven Knowledge Distillation for Personalized Federated Learning in EEG-Based Emotion Recognition

Xihang Qiu, Wanyong Qiu, Ye Zhang, Kun Qian, Chun Li, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

IEEE Journal of Biomedical and Health Informatics · 16 Apr 2025 · doi:10.1109/JBHI.2025.3562090

Federated learning (FL) has gained prominence in EEG-based emotion recognition due to its ability to enable secure collaborative training without centralized data. However, traditional FL faces challenges due to model and data heterogeneity in smart healthcare settings. For example, medical institutions have varying computational resources, which creates a need for personalized local models. Moreover, EEG data from medical institutions typically face data heterogeneity issues stemming from limitations in participant availability, ethical constraints, and cultural differences among subjects, which can slow model convergence and degrade model performance. To address these challenges, we propose FedKDC, a novel FL framework that incorporates clustered knowledge distillation (CKD). This method introduces a consensus-based distributed learning mechanism to facilitate the clustering process. It then enhances the convergence speed through intraclass distillation and reduces the negative impact of heterogeneity through interclass distillation. Additionally, we introduce a DriftGuard mechanism to mitigate client drift, along with an entropy reducer to decrease the entropy of aggregated knowledge. The framework is validated on the SEED, SEED-IV, SEED-FRA, and SEED-GER datasets, demonstrating its effectiveness in scenarios where both the data and the models are heterogeneous. Experimental results show that FedKDC outperforms other FL frameworks in emotion recognition, achieving a maximum average accuracy of 85.2%, and in convergence efficiency, with faster and more stable convergence. Our code is made publicly available at: https://github.com/wdqdp/FedKDC.

2024

Federated Abnormal Heart Sound Detection with Weak to No Labels

Wanyong Qiu, Chen Quan, Yongzi Yu, Eda Kara, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

Cyborg and Bionic Systems · 10 Sep 2024 · doi:10.34133/cbsystems.0152

Cardiovascular diseases are a prominent cause of mortality, emphasizing the need for early prevention and diagnosis. Utilizing artificial intelligence (AI) models, heart sound analysis emerges as a noninvasive and universally applicable approach for assessing cardiovascular health conditions. However, real-world medical data are dispersed across medical institutions, forming “data islands” due to data sharing limitations for security reasons. To this end, federated learning (FL) has been extensively employed in the medical field, which can effectively model across multiple institutions. Additionally, conventional supervised classification methods require fully labeled data classes, e.g., binary classification requires labeling of positive and negative samples. Nevertheless, the process of labeling healthcare data is time-consuming and labor-intensive, leading to the possibility of mislabeling negative samples. In this study, we validate an FL framework with a naive positive-unlabeled (PU) learning strategy. Semisupervised FL model can directly learn from a limited set of positive samples and an extensive pool of unlabeled samples. Our emphasis is on vertical-FL to enhance collaboration across institutions with different medical record feature spaces. Additionally, our contribution extends to feature importance analysis, where we explore 6 methods and provide practical recommendations for detecting abnormal heart sounds. The study demonstrated an impressive accuracy of 84%, comparable to outcomes in supervised learning, thereby advancing the application of FL in abnormal heart sound detection.

Fed-MStacking: Heterogeneous Federated Learning With Stacking Misaligned Labels for Abnormal Heart Sound Detection

Wanyong Qiu, Yifan Feng, Yuying Li, Yi Chang, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller

IEEE Journal of Biomedical and Health Informatics · 16 Jul 2024 · 10.1109/JBHI.2024.3428512

Ubiquitous sensing has been widely applied in smart healthcare, providing an opportunity for intelligent heart sound auscultation. However, smart devices contain sensitive information, raising user privacy concerns. To this end, federated learning (FL) has been adopted as an effective solution, enabling decentralised learning without data sharing, thus preserving data privacy in the Internet of Health Things (IoHT). Nevertheless, traditional FL requires the same architectural models to be trained across local clients and global servers, leading to a lack of model heterogeneity and client personalisation. For medical institutions with private data clients, this study proposes Fed-MStacking, a heterogeneous FL framework that incorporates a stacking ensemble learning strategy to support clients in building their own models. The secondary objective of this study is to address scenarios involving local clients with data characterised by inconsistent labelling. Specifically, the local client contains only one case type, and the data cannot be shared within or outside the institution. To train a global multi-class classifier, we aggregate missing class information from all clients at each institution and build meta-data, which then participates in FL training via a meta-learner. We apply the proposed framework to a multi-institutional heart sound database. The experiments utilise random forests (RFs), feedforward neural networks (FNNs), and convolutional neural networks (CNNs) as base classifiers. The results show that the heterogeneous stacking of local models performs better compared to homogeneous stacking.

Heart Sound Abnormality Detection From Multi-Institutional Collaboration: Introducing a Federated Learning Framework

Wanyong Qiu, Chen Quan, Lixian Zhu, Yongzi Yu, Zhihua Wang, …, Yi Chang, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller

IEEE Transactions on Biomedical Engineering · 03 May 2024 · doi:10.1109/TBME.2024.3393557

Objective: Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. Methods: In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their “black-box” property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. Conclusion: Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. Significance: This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.

All

2025

2024

Commonly used website links