Speech based Emotion Recognition using CNN Classifier

B. Sandeep; Dr. R. Sivaranjani; R. Mourya; J. Sai Vinay; Y. Vineela

doi:XX.XXX/IJARIIT-V7I3-1372

This paper is published in Volume-7, Issue-3, 2021

Paper Details
Abstract & PDF

Area

Computer Science

Author

B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela

Org/Univ

Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, Andhra Pradesh, India

Pub. Date

08 June, 2021

Paper ID

V7I3-1372

Publisher

IJARIIT

Edition

Volume-7, Issue-3, 2021

Keywords

CNN, Audio Feature Extraction, Librosa, RAVDES, SER, MFCC

Citations

IEEE
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela. Speech based Emotion Recognition using CNN Classifier, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela (2021). Speech based Emotion Recognition using CNN Classifier. International Journal of Advance Research, Ideas and Innovations in Technology, 7(3) www.IJARIIT.com.

MLA
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela. "Speech based Emotion Recognition using CNN Classifier." International Journal of Advance Research, Ideas and Innovations in Technology 7.3 (2021). www.IJARIIT.com.

Give proper credits, use Citation.

Abstract

Communication through voice is one of the main components of affective computing in human-computer interaction. In this type of interaction, properly comprehending the meanings of the words or the linguistic category and recognizing the emotion included in the speech is essential for enhancing the performance. In order to model the emotional state, the speech waves are utilized, which bear signals standing for emotions such as boredom, fear, joy and sadness. This project is aiming to design and develop speech based emotional reaction (SER) prediction system, where different emotions are recognized by means of Convolutional Neural Network (CNN) classifiers. Spectral features extracted is mel-frequency cepstral (MFCC). Librosa package in python language is used to develop proposed algorithm and its performance is tested on taking Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) samples to differentiate emotions such as happiness, surprise, anger, neutral state, sadness, fear etc. Feature selection (FS) was applied in order to seek the most relevant feature subset. Results show that the maximum gain in performance is achieved by using CNN.

All content is copyright protected.