This paper is published in Volume-5, Issue-3, 2019
Area
Machine Learning
Author
Aamir Ahmad Khandy
Co-authors
Dr. Rohit Miri
Org/Univ
Dr. C.V. Raman University, Kargi Road Kota, Bilaspur, Chittisgrah, India
Pub. Date
20 May, 2019
Paper ID
V5I3-1378
Publisher
Keywords
Big data, Unstructured data, Clustering algorithms, MongoDB

Citationsacebook

IEEE
Aamir Ahmad Khandy, Dr. Rohit Miri. uDCLUST: A novel algorithm for clustering unstructured data, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Aamir Ahmad Khandy, Dr. Rohit Miri (2019). uDCLUST: A novel algorithm for clustering unstructured data. International Journal of Advance Research, Ideas and Innovations in Technology, 5(3) www.IJARIIT.com.

MLA
Aamir Ahmad Khandy, Dr. Rohit Miri. "uDCLUST: A novel algorithm for clustering unstructured data." International Journal of Advance Research, Ideas and Innovations in Technology 5.3 (2019). www.IJARIIT.com.

Abstract

Data that has been arranged and systematized into an organized and formatted repository, usually a database, so that its elements and essential features and can be made directly accessible for more powerful and adequate processing and analysis is known as Structured Data. Un-structured data is data that doesn’t fit accurately in a traditional database and has no identifiable internal structure and a predefined data model. We cannot perform different operations like update, insert and delete on un-structured data. Clustering is a process of unsupervised learning and is the most common method for mathematical and demographic data analysis. It is the main task of preliminary data mining, and an ordinary technique for statistical data analysis, mathematical data analysis, demographic data analysis, used in many fields, including ML (Machine Learning), recognition of patterns, analysis of images, retrieval of information, bioinformatics, compression of data and computer graphics. Available clustering algorithms have the difficulty to determine the number of clusters in a dataset and also are difficult to cluster outliers even that have common groups. A final related drawback arises from the shape of the data cluster where it is difficult and complex to cluster non-spherical and overlapping datasets. In this framework, we intended and designed an algorithm called uDCLUST (Un-structured Data Clustering), which identifies an appropriate number of clusters in unstructured data as well as cluster outliers easily with non-spherical and overlapping datasets.
Paper PDF