A comparative analysis of machine learning techniques for automatic text classification

Sivakami M.; Dr. M. Thangaraj; P. Aruna Saraswathy

doi:XX.XXX/IJARIIT-V7I2-1523

This paper is published in Volume-7, Issue-2, 2021

Paper Details
Abstract & PDF

Area

Computer Science

Author

Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy

Org/Univ

Madurai Kamaraj University, Madurai, Tamil Nadu, India

Pub. Date

28 April, 2021

Paper ID

V7I2-1523

Publisher

IJARIIT

Edition

Volume-7, Issue-2, 2021

Keywords

Naive Bayes, Support Vector Machine, Decision Tree, Text Classification, WEKA, J48, Automatic Text Mining, IBK

Citations

IEEE
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy. A comparative analysis of machine learning techniques for automatic text classification, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy (2021). A comparative analysis of machine learning techniques for automatic text classification. International Journal of Advance Research, Ideas and Innovations in Technology, 7(2) www.IJARIIT.com.

MLA
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy. "A comparative analysis of machine learning techniques for automatic text classification." International Journal of Advance Research, Ideas and Innovations in Technology 7.2 (2021). www.IJARIIT.com.

Give proper credits, use Citation.

Abstract

Text processing and its related activities have reached their peak demand in the present days due to the increase of unstructured data. The underlying structure in any text can be derived through categorization techniques. The capacity of text classification algorithms to perform the conversion from structured to unstructured data is the key factor in all text processing activities. To further enhance this, many concepts from other disciplines such as statistics, physics, and mathematics were tailored to suit the needs of text analyzing pipelines. Text classification techniques help to build the template necessary for extracting meaningful information. Hence, this paper undertakes a study of comparison on various text classification algorithms to reiterate their suitability for particular classes of problems. The algorithms such as ‘Naïve Bayes’, ‘Support Vector Machine’, ‘K- nearest neighbor’, and ‘Decision Tree’ were studied based on empirical analysis with respect to the WEKA data analysis platform. From the experimental results, it is seen that the strength of algorithms depended on the data type, nature of attributes, and representation of the classes. This is verified by various accuracy metrics used in the study such as precision, recall, accuracy, F1- scores, and ROC values.

All content is copyright protected.