This paper is published in Volume-7, Issue-3, 2021
Area
Computer Science
Author
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela
Org/Univ
Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, Andhra Pradesh, India
Pub. Date
08 June, 2021
Paper ID
V7I3-1372
Publisher
Keywords
CNN, Audio Feature Extraction, Librosa, RAVDES, SER, MFCC

Citationsacebook

IEEE
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela. Speech based Emotion Recognition using CNN Classifier, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela (2021). Speech based Emotion Recognition using CNN Classifier. International Journal of Advance Research, Ideas and Innovations in Technology, 7(3) www.IJARIIT.com.

MLA
B. Sandeep, Dr. R. Sivaranjani, R. Mourya, J. Sai Vinay, Y. Vineela. "Speech based Emotion Recognition using CNN Classifier." International Journal of Advance Research, Ideas and Innovations in Technology 7.3 (2021). www.IJARIIT.com.

Abstract

Communication through voice is one of the main components of affective computing in human-computer interaction. In this type of interaction, properly comprehending the meanings of the words or the linguistic category and recognizing the emotion included in the speech is essential for enhancing the performance. In order to model the emotional state, the speech waves are utilized, which bear signals standing for emotions such as boredom, fear, joy and sadness. This project is aiming to design and develop speech based emotional reaction (SER) prediction system, where different emotions are recognized by means of Convolutional Neural Network (CNN) classifiers. Spectral features extracted is mel-frequency cepstral (MFCC). Librosa package in python language is used to develop proposed algorithm and its performance is tested on taking Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) samples to differentiate emotions such as happiness, surprise, anger, neutral state, sadness, fear etc. Feature selection (FS) was applied in order to seek the most relevant feature subset. Results show that the maximum gain in performance is achieved by using CNN.