Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads

Prathamesh Chaudhari; Gaurav S. Salve; Nilesh Ghadge; Harish Barapatre

doi:XX.XXX/IJARIIT-V3I2-1361

This paper is published in Volume-3, Issue-2, 2017

Paper Details
Abstract & PDF

Area

Hadoop

Author

Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre

Org/Univ

Yadavrao Tasgaonkar Institute Of Engineering And Technology, Karjat, Maharashtra, India

Pub. Date

31 March, 2017

Paper ID

V3I2-1361

Publisher

IJARIIT

Edition

Volume-3, Issue-2, 2017

Keywords

MapReduce, Hadoop, flow-Shops, Scheduling Algorithm, Job Ordering.

Citations

IEEE
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre. Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre (2017). Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads. International Journal of Advance Research, Ideas and Innovations in Technology, 3(2) www.IJARIIT.com.

MLA
Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre. "Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads." International Journal of Advance Research, Ideas and Innovations in Technology 3.2 (2017). www.IJARIIT.com.

Give proper credits, use Citation.

Abstract

In the dynamic MR process apart from the three concepts present in the paper, we are going to introduce clustering approach. In addition to the multi data center processing, we are going to add clustering concept. Because we are going to split the data and process the data in multiple data centers. If we combine the similar data’s into clusters using the k-means algorithm. By clustering the data we can able to process the data in short execution time. After preprocessing, we split our process into multiple files and apply clustering process. Here we are going to use k-means clustering algorithm which is a standard algorithm, which helps to process the data in a short execution time. We improve the performance of a MapReduce cluster via optimizing the slot utilization primarily from two perspectives. First, we can classify the slots into two types, namely, busy slots (i.e., with running tasks) and idle slots (i.e., no running tasks). Given the total number of map and reduce slots configured by users, one optimization approach (i.e., macro-level optimization) is to improve the slot utilization by maximizing the number of busy slots and reducing the number of idle slots. Second, it is worth not in that, not every busy slot can be efficiently utilized. Thus, our optimization approach (i.e., micro-level optimization) is to improve the utilization efficiency of busy slots after the macro-level optimization. Particularly, we identify two main affecting factors: Speculative tasks based on these, we propose DynamicMR, a dynamic utilization optimization framework for MapReduce, to improve the performance of a shared Hadoop cluster under a fair scheduling between users.

All content is copyright protected.