hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Ferguson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores
Date Mon, 11 Jun 2012 18:05:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292933#comment-13292933

Andrew Ferguson commented on MAPREDUCE-4327:

Hi Arun,

I'm excited to see this started -- I'm quite interested in the multi-resource scheduling problem.
After reading through the patch, I have a few questions for you; hopefully this feedback will
be helpful.

First off, I want to confirm my understanding is correct: this patch is designed to allocate
resources to jobs within the same capacity queue based on the DRF-inspired ordering of their
need for resources. It is not designed to do weighted DRF for the complete cluster. If I'm
mistaken, perhaps some of my feedback my not apply.

1) Are you planning to change the definition of a queue's capacity? Currently, it is defined
as a fractional percentage of the parent queue's total memory. Alternatively, queues could
be specified with a fractional percentage of each resource. eg, I could have one queue with
"75% CPU and 50% RAM" and a second with "25% CPU and 50% RAM".

2) Do you plan to change how spare capacity is allocated? My understanding is that it's currently
shared proportionally, based on the queue capacities, an approach seems like it would be intuitive
for cluster operators. With a multi-resource setup however, running DRF on the pool of spare
resources would provide higher utilization. (I can provide an example of this if you'd like.)

3) Are you planning to support priorities or weights within the queues? IIRC, this was supported
in the MR1 scheduler, and the DRF paper describes a weighted extension.

4) Lastly, with the increasing flexibility of the YARN scheduler, I think it makes sense to
better support heterogenous clusters. Currently, yarn.nodemanager.resource.memory-mb is a
constant across the cluster, but with a scheduler capable of packing differently shaped resource
containers onto each node, heterogenous nodes would be a natural extension. (This is more
of an observation than a question. :-)

Looking forward to further discussions.


> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327.patch
> With YARN being a general purpose system, it would be useful for several applications
(MPI et al) to specify not just memory but also CPU (cores) for their resource requirements.
Thus, it would be useful to the CapacityScheduler to account for both.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message