Survey Report
Survey Paper on Advancements in Dysarthric Speech Recognition Systems
Dysarthria, a motor speech disorder resulting from neurological injuries, severely impairs intelligibility, making automatic speech recognition (ASR) a vital tool for enhancing communication. Over the years, significant research has explored computational approaches to improve ASR performance for dysarthric speech, from early rule-based models to deep learning architectures. This survey presents a comprehensive review of the evolution of ASR techniques tailored to dysarthric speech, categorizing methods by architecture type (HMM, DNN, CNN, LSTM, Transformers), learning paradigm (supervised, self-supervised, meta-learning), and input modality (audio-only, multimodal). The study examines the role of acoustic features like MFCC, PLP, and raw waveform-based learning. It compares key models, including Wav2Vec2.0, TDNN, and UTran-DSR, across UA-Speech, TORGO, and CommonVoice datasets. A critical evaluation of strategies like speaker adaptation, transfer learning, end-to-end pipelines, and contrastive learning is provided, along with their impact on accuracy and generalization. The paper highlights emerging trends such as emotion-aware ASR, multimodal fusion, and personalized adaptation, while addressing persistent challenges including data scarcity, speaker variability, and real-time deployment. This survey aims to provide a clear roadmap of the progress and ongoing efforts in dysarthric ASR, guiding future research toward more inclusive and intelligent speech interfaces.
Published by: Sushmita Chaudhari, Mansi Chopkar, Harshvardhan Gaikwad, Anuj Raj
Author: Sushmita Chaudhari
Paper ID: V11I2-1282
Paper Status: published
Published: June 10, 2025
Full Details