hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2915) Enable YARN RM scale out via federation using multiple RM's
Date Wed, 08 Jul 2015 16:43:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618894#comment-14618894

Carlo Curino commented on YARN-2915:

Lei, first let me make sure we are on the same page regarding router. The router is "soft-state"
and a rather lightweight components, so we envision multiple routers to run in each data-center,
and definitely agreed that we will have at least one router per DC if/when we run a federation

Lei, regarding the (good) question you asked about ARMMProxy. 

The comment is derived from some early experimentation we did with the AMRMProxy from YARN-2884.
The idea is that you could use the mux/demux mechanics that the AMRMproxy provides to hide
multiple standalone YARN clusters (not part of a federation), behind a single AMRMProxy. The
scenarios goes as follows, you have a (possibly small) cluster that I will call the "launchpad"
running one or more AMRMProxy(s), and say 2 standalone YARN clusters (C1, C2) that are not
federation enabled. Jobs can be submitted to C1, C2 directly as always, and jobs that want
to span, could be submitted to the "launchpad" cluster. By customizing the policy in the AMRMProxy
that determines how we forward requests to clusters, you can have an AM running on the launchpad
cluster to forward the requests to both C1 and C2. For C1 and C2 this will look like as if
you submitted an unmanaged AM in each cluster. The job on the other hand thinks he is talking
with a single RM that happens to run somewhere in the "launchpad" cluster (typically on the
same node), but this is just the AMRMProxy impersonating an RM.

To make this even more clear: we don't strictly need an AMRMProxy on each node for the story
to work. However, given our current thinking/experimentation we see advantages in running
the AMRMProxy on each node, such as: we avoid 2 network hops, we have a better AM-AMRMProxy
ratios so we are more resilient to DDOS on the AMRMProtocol, less partitioning scenarios to
consider, etc... so this is what we are advocating for in federation.

In federation, we go a step further and we ask C1 and C2 to commit to sharing resources in
the federation (by heartbeating to the StateStore), and we provide lot more mechanics around
it (e.g., UIs that show the overall use of resources across clusters, rebalancing mechanisms,
fault-tolerance mechanics, etc..), that makes for a tighter overall experience. 
Overall, I think running the entire federation code will be better, but I was pointing out
that some of the pieces we are building could be leveraged in isolation for more lightweight
/ ad-hoc forms of cross-cluster interaction. The rule-based global router that [~subru] mentioned
above falls in the same category. 

> Enable YARN RM scale out via federation using multiple RM's
> -----------------------------------------------------------
>                 Key: YARN-2915
>                 URL: https://issues.apache.org/jira/browse/YARN-2915
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Subru Krishnan
>         Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, Yarn_federation_design_v1.pdf,
> This is an umbrella JIRA that proposes to scale out YARN to support large clusters comprising
of tens of thousands of nodes.   That is, rather than limiting a YARN managed cluster to about
4k in size, the proposal is to enable the YARN managed cluster to be elastically scalable.

This message was sent by Atlassian JIRA

View raw message