hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's
Date Wed, 08 Jul 2015 00:20:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617707#comment-14617707

Carlo Curino commented on YARN-2915:


During the bird of a feather at Hadoop Summit 2015, and in separate conversations with [~kasha],
[~leftnoteasy], [~jianhe], [~vinodkv], we received multiple questions on how we plan to handle
global scheduler invariants with the local enforcement provided by the sub-cluster RMs. 

The attached FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf is a short presentation that explains
in more details our ideas. 

The key intuition is that we will have a spectrum of options ranging from full-replication
of the queue structure in each sub-cluster to a full partitioning of it. On one extreme we
will have a the best spreading of load and best fairness, while on the opposite extreme we
will get the best scalability and isolation among tenants. Navigating the middle ground requires
dynamic algorithms that continuously re-balance the queue mappings. Conceptually the problem
is very close to preemption for node-labels when we allow rich expression and preferences
on node labels. 

We propose an initial simple approach (re-using some of the preemption work to detect global
imbalancing), and we are considering an LP-based modeling of the problem (possibly leveraging
the apache-licensed solver in google or-tools). 
The solution we propose has the potential to provide a simple concrete initial version (which
is likely to scale substantially), that we can iterate on getting better and better on it.
Much of this must be driven by experimental results based on our initial prototype (which
we are about to post code for).


> Enable YARN RM scale out via federation using multiple RM's
> -----------------------------------------------------------
>                 Key: YARN-2915
>                 URL: https://issues.apache.org/jira/browse/YARN-2915
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Subru Krishnan
>         Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, Yarn_federation_design_v1.pdf
> This is an umbrella JIRA that proposes to scale out YARN to support large clusters comprising
of tens of thousands of nodes.   That is, rather than limiting a YARN managed cluster to about
4k in size, the proposal is to enable the YARN managed cluster to be elastically scalable.

This message was sent by Atlassian JIRA

View raw message