hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
Date Thu, 16 May 2013 00:51:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659091#comment-13659091
] 

Carlo Curino commented on YARN-624:
-----------------------------------

Alejandro, I completely agree gang scheduling is an important and missing use case.  As I
told you in person, I spoke with various machine-learning guys and they are very interested
in gang scheduling (they are working on their own AM for ML computations). From the conversation
I am convinced their asks represent a rather common requirement for much of ML-type applications.
In particular, they were interested in the "or" use-case you mentioned. 

Specifically they want to be able to express this:
1) 1 container with 128GB of RAM and 16cores OR
2) 10 containers with 16GB of RAM and 2 cores OR 
3) 100 containers with 2GB of RAM and 1 core

In term of locality I can see three main scenarios:
1) absolute locality, i.e., I need a gang of N containers on this rack, or on these set of
nodes, 
2) relative locality, i.e., I need a gang of N containers "close to each other" (this really
captures more of a network property than anything else)
3) (no locality), i.e., I need a gang of N containers anywhere in the cluster
  
                
> Support gang scheduling in the AM RM protocol
> ---------------------------------------------
>
>                 Key: YARN-624
>                 URL: https://issues.apache.org/jira/browse/YARN-624
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, scheduler
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs
a set of tasks when they can all be run at the same time, would be a useful feature for YARN
schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they get all the
ones they need.  However, this lends itself to deadlocks when different AMs are waiting on
the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message