This paper is published in Volume-7, Issue-2, 2021
Area
Computer Science
Author
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy
Org/Univ
Madurai Kamaraj University, Madurai, Tamil Nadu, India
Pub. Date
28 April, 2021
Paper ID
V7I2-1523
Publisher
Keywords
Naive Bayes, Support Vector Machine, Decision Tree, Text Classification, WEKA, J48, Automatic Text Mining, IBK

Citationsacebook

IEEE
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy. A comparative analysis of machine learning techniques for automatic text classification, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy (2021). A comparative analysis of machine learning techniques for automatic text classification. International Journal of Advance Research, Ideas and Innovations in Technology, 7(2) www.IJARIIT.com.

MLA
Sivakami M., Dr. M. Thangaraj, P. Aruna Saraswathy. "A comparative analysis of machine learning techniques for automatic text classification." International Journal of Advance Research, Ideas and Innovations in Technology 7.2 (2021). www.IJARIIT.com.

Abstract

Text processing and its related activities have reached their peak demand in the present days due to the increase of unstructured data. The underlying structure in any text can be derived through categorization techniques. The capacity of text classification algorithms to perform the conversion from structured to unstructured data is the key factor in all text processing activities. To further enhance this, many concepts from other disciplines such as statistics, physics, and mathematics were tailored to suit the needs of text analyzing pipelines. Text classification techniques help to build the template necessary for extracting meaningful information. Hence, this paper undertakes a study of comparison on various text classification algorithms to reiterate their suitability for particular classes of problems. The algorithms such as ‘Naïve Bayes’, ‘Support Vector Machine’, ‘K- nearest neighbor’, and ‘Decision Tree’ were studied based on empirical analysis with respect to the WEKA data analysis platform. From the experimental results, it is seen that the strength of algorithms depended on the data type, nature of attributes, and representation of the classes. This is verified by various accuracy metrics used in the study such as precision, recall, accuracy, F1- scores, and ROC values.