hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Ferguson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule accounting for both memory and cpu cores
Date Thu, 05 Jul 2012 19:10:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407403#comment-13407403

Andrew Ferguson commented on MAPREDUCE-4327:

hi Robert,

Thanks for you feedback! since I posted the earlier update, I've been pushing it to completion:
adding CPU core information to the queue metrics, resource manager web interface, etc. I've
also been adding test cases and ensuring that the new patch passes existing test cases as
well. currently, the patch is failing just a few unit tests, but I expect it will be done
in a day or two.

as the patch has grown quite large (the diff is pushing 7000 lines..), it's clear we want
to minimize the cost of adding a third resource. as it is, most of the diff is new testing.
I will strive to keep function calls as general as possible (eg, "Resource r" instead of "int
memory, float cores"), but there are quite a few places where we want to consider each resource
separately since the math can be different, and it should be clear to anyone adding additional
resources that they need to consider something in that function's logic.

Regarding applications which haven't been updated for CPU cores, and might submit a request
with 0 or NULL, my current patch does round the request to the minimum resource request, so
those applications will be fine. (not sure if the currently attached patch does this)

Regarding "spare capacity" -- I think this is one of the differences between the capacity
scheduler and the fair scheduler. should the capacity not in use (or leftover capacity from
queues which can't fill it because of the new multi-dimensional nature of resources) be simply
split over the queues based on their capacity percentages? or should that capacity be treated
as a single pool, and allocations be made treating the capacity percentages as weights? (this
is more of a Fair Sched approach). anyway, I agree,, that should probably be left as a separate
JIRA, or perhaps simply left to the Fair Scheduler.

I'll incorporate your other points (eg, comparator name, ASF license) in my updated patch.


> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch,
> With YARN being a general purpose system, it would be useful for several applications
(MPI et al) to specify not just memory but also CPU (cores) for their resource requirements.
Thus, it would be useful to the CapacityScheduler to account for both.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message