This paper is published in Volume-3, Issue-2, 2017
Area
Hadoop
Author
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre
Org/Univ
Yadavrao Tasgaonkar Institute Of Engineering And Technology, Karjat, Maharashtra, India
Pub. Date
31 March, 2017
Paper ID
V3I2-1361
Publisher
Keywords
MapReduce, Hadoop, flow-Shops, Scheduling Algorithm, Job Ordering.

Citationsacebook

IEEE
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre. Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre (2017). Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads. International Journal of Advance Research, Ideas and Innovations in Technology, 3(2) www.IJARIIT.com.

MLA
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre. "Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads." International Journal of Advance Research, Ideas and Innovations in Technology 3.2 (2017). www.IJARIIT.com.

Abstract

In the dynamic MR process apart from the three concepts present in the paper, we are going to introduce clustering approach. In addition to the multi data center processing, we are going to add clustering concept. Because we are going to split the data and process the data in multiple data centers. If we combine the similar data’s into clusters using the k-means algorithm. By clustering the data we can able to process the data in short execution time. After preprocessing, we split our process into multiple files and apply clustering process. Here we are going to use k-means clustering algorithm which is a standard algorithm, which helps to process the data in a short execution time. We improve the performance of a MapReduce cluster via optimizing the slot utilization primarily from two perspectives. First, we can classify the slots into two types, namely, busy slots (i.e., with running tasks) and idle slots (i.e., no running tasks). Given the total number of map and reduce slots configured by users, one optimization approach (i.e., macro-level optimization) is to improve the slot utilization by maximizing the number of busy slots and reducing the number of idle slots. Second, it is worth not in that, not every busy slot can be efficiently utilized. Thus, our optimization approach (i.e., micro-level optimization) is to improve the utilization efficiency of busy slots after the macro-level optimization. Particularly, we identify two main affecting factors: Speculative tasks based on these, we propose DynamicMR, a dynamic utilization optimization framework for MapReduce, to improve the performance of a shared Hadoop cluster under a fair scheduling between users.