hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
Date Tue, 13 Aug 2013 00:39:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737631#comment-13737631
] 

Carlo Curino commented on YARN-624:
-----------------------------------

Robert,

That makes sense. I think we should have some guidelines for people on what to do while we
work out the details of how to get gang-scheduling right. 
As I was mentioning few posts above, I also heard requests from people doing machine learning
of rather exotic versions of gang-scheduling. 

We can definitely make preemption gang-aware, but it is not trivial to get the semantics and
corner-cases right, in a sense what we are
really in the game of discussing is a conversion rate between capacity/fairness and cluster
efficiency, e.g., is it worth to discard the
progress made by 200 containers for 20min to give this another application all its rightful
capacity right away? Hard question.

Part of a longer term research I am involved in is to quantify this trade offs more clearly
by projecting both in an economical value space. 
But this is not going to be ready for a long while.

                
> Support gang scheduling in the AM RM protocol
> ---------------------------------------------
>
>                 Key: YARN-624
>                 URL: https://issues.apache.org/jira/browse/YARN-624
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, scheduler
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs
a set of tasks when they can all be run at the same time, would be a useful feature for YARN
schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they get all the
ones they need.  However, this lends itself to deadlocks when different AMs are waiting on
the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message