hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tang shanjiang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5643) DynamicMR: A Dynamic Slot Utilization Optimization Framework for Hadoop MRv1
Date Fri, 22 Nov 2013 09:32:37 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

tang shanjiang updated MAPREDUCE-5643:
--------------------------------------

    Description: 
Hadoop MRv1 uses the slot-based resource model with the static configuration of map/reduce
slots. There is a strict utility constrain that map tasks can only run on map slots and reduce
tasks can only use reduce slots. Due to the rigid execution order between map and reduce tasks
in a MapReduce environment, slots can be severely under-utilized, which significantly degrades
the performance. 

In contrast to YARN that gives up the slot-based resource model to maximize resource utilization,
we keep the slot-based model and propose a dynamic slot utilization optimization system called
DynamicMR to improve the performance of Hadoop by maximizing the slots utilization and improving
utilization efficiency while guaranteeing the fairness across pools. It consists of three
levels of scheduling components, namely, Dynamic Hadoop Fair Scheduler (DHFS), Dynamic Speculative
Task Scheduler (DSTS), and Data Locality Maximization Scheduler (DLMS).

Our tests show that DynamicMR outperforms YARN for MapReduce workloads with multiple jobs,
especially when the number of jobs is large. The explanation is that, given a certain number
of resources, it is obvious that the performance for the case with a ratio control of concurrently
running map and reduce tasks is better than without control. Because without control, it easily
occurs that there are too many reduce tasks running, causing the network to be a bottleneck
seriously. For YARN, both map and reduce tasks can run on any idle container. There is no
control mechanism for the ratio of resource allocation between map and reduce tasks. It means
that when there are pending reduce tasks, the idle container will be most likely possessed
by them. In contrast, DynamicMR follows the traditional slot-based model. In contrast to the
’hard’ constrain of slot allocation that map slots have to be allocated to map tasks and
reduce tasks should be dispatched to reduce tasks, DynamicMR obeys a ’soft’ constrain
of slot allocation to allow that map slot can be allocated to reduce task and vice versa.
But whenever there are pending map tasks, the map slot should be given to map tasks first,
and the rule is similar for reduce tasks. It means that, the traditional way of static map/reduce
slot configuration for the ratio control of running map/reduce tasks still works for DynamicMR.
In comparison to YARN which maximizes the resource utilization only, DynamicMR can maximize
the slot resource utilization and meanwhile dynamically control the ratio of running map/reduce
tasks via map/reduce slot configuration.

  was:
Hadoop MRv1 uses the slot-based resource model with the static configuration of map/reduce
slots in advance. Due to the rigid execution order between map and reduce tasks in a MapReduce
environment and the strict execution constrain that map tasks can only run map slots and reduce
tasks can only reduce slots, slots can be severely under-utilized, which significantly degrades
the performance. 

In contrast to YARN that gives up the slot-based resource model to maximize resource utilization,
we keep the slot-based model and propose a dynamic slot utilization optimization system called
DynamicMR to improve the performance of Hadoop by maximizing the slots utilization and improving
utilization efficiency while guaranteeing the fairness across pools. It consists of three
levels of scheduling components, namely, Dynamic Hadoop Fair Scheduler (DHFS), Dynamic Speculative
Task Scheduler (DSTS), and Data Locality Maximization Scheduler (DLMS).

Our tests show that DynamicMR outperforms YARN for MapReduce workloads with multiple jobs,
especially when the number of jobs is large.


> DynamicMR: A Dynamic Slot Utilization Optimization Framework for Hadoop MRv1
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5643
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5643
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>    Affects Versions: 1.2.1
>            Reporter: tang shanjiang
>            Assignee: tang shanjiang
>              Labels: performance
>         Attachments: DynamicMR-0.1.1-patch, README
>
>
> Hadoop MRv1 uses the slot-based resource model with the static configuration of map/reduce
slots. There is a strict utility constrain that map tasks can only run on map slots and reduce
tasks can only use reduce slots. Due to the rigid execution order between map and reduce tasks
in a MapReduce environment, slots can be severely under-utilized, which significantly degrades
the performance. 
> In contrast to YARN that gives up the slot-based resource model to maximize resource
utilization, we keep the slot-based model and propose a dynamic slot utilization optimization
system called DynamicMR to improve the performance of Hadoop by maximizing the slots utilization
and improving utilization efficiency while guaranteeing the fairness across pools. It consists
of three levels of scheduling components, namely, Dynamic Hadoop Fair Scheduler (DHFS), Dynamic
Speculative Task Scheduler (DSTS), and Data Locality Maximization Scheduler (DLMS).
> Our tests show that DynamicMR outperforms YARN for MapReduce workloads with multiple
jobs, especially when the number of jobs is large. The explanation is that, given a certain
number of resources, it is obvious that the performance for the case with a ratio control
of concurrently running map and reduce tasks is better than without control. Because without
control, it easily occurs that there are too many reduce tasks running, causing the network
to be a bottleneck seriously. For YARN, both map and reduce tasks can run on any idle container.
There is no control mechanism for the ratio of resource allocation between map and reduce
tasks. It means that when there are pending reduce tasks, the idle container will be most
likely possessed by them. In contrast, DynamicMR follows the traditional slot-based model.
In contrast to the ’hard’ constrain of slot allocation that map slots have to be allocated
to map tasks and reduce tasks should be dispatched to reduce tasks, DynamicMR obeys a ’soft’
constrain of slot allocation to allow that map slot can be allocated to reduce task and vice
versa. But whenever there are pending map tasks, the map slot should be given to map tasks
first, and the rule is similar for reduce tasks. It means that, the traditional way of static
map/reduce slot configuration for the ratio control of running map/reduce tasks still works
for DynamicMR. In comparison to YARN which maximizes the resource utilization only, DynamicMR
can maximize the slot resource utilization and meanwhile dynamically control the ratio of
running map/reduce tasks via map/reduce slot configuration.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message