Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Sat, 16 Jul 2016 20:51:21 +0000 (UTC)
From: "Carlo Curino (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12758944.1417540069000.51229.1468702281160@Atlassian.JIRA>
In-Reply-To: <JIRA.12758944.1417540069000@Atlassian.JIRA>
References: <JIRA.12758944.1417540069000@Atlassian.JIRA> <JIRA.12758944.1417540069239@arcas>
Subject: [jira] [Comment Edited] (YARN-2915) Enable YARN RM scale out via
 federation using multiple RM's
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Sat, 16 Jul 2016 20:51:23 -0000


    [ https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380937#comment-15380937 ] 

Carlo Curino edited comment on YARN-2915 at 7/16/16 8:50 PM:
-------------------------------------------------------------

[~vinodkv], Incidentally we were discussing this with [~subru] just yesterday.

*Philosophically:*
I agree with you that node-labels and node-label expressions are very powerful and could subsume much of the rest of yarn locality/sub-clusters etc. 

Another aspect that makes this equivalence somewhat pleasant is that in some reasonably restricted scenario this is quite natural. E.g., given two node-partitions labels (blue, red), at the moment the {{CapacityScheduler}} behaves almost as if the world of blue nodes and red nodes are completely orthogonal to each other. Mapping this onto having two separate RMs dealing with blue nodes and red nodes should be rather straightforward. This is to say that if we simply "paint" each sub-cluster blue or red, it is not to hard to enforce this. Admins could use this concept to manually allocate capacity onto physical sub-clusters by manipulating labels instead of thinking of sub-clusters too explicitly. 

The two concerns of these are:
 # labels are very generic and we risk to confuse the admins as we use the same constructs to refer to very physical or very logical entities (good and bad)
 # Handling richer/more complex intersection of node-label partitions and sub-clusters notions (i.e., where they are not aligned as I described) might get trickier and requires the "digging deeper" you suggested.

All in all, I am in favor of this especially if we also tackle more substantially the scheduler rewrite work we have discussed.

*Practically:*
I think we should land a v0 of federation with all basic mechanisms in place, but with a somewhat limited admin surface that is not fully transparent yet. (i.e., we give users full transparency, but ask a little more to our admins to begin with). This allows us to harden much of the internals and mechanics, before polishing all the tooling around it. Priority-wise this is very important to us.

In v1 (soon after) we will improve this with: 
 # admin tooling that maps the single-logical view of (queue + labels) to multiple subclusters queues + labels transparently (achieving I think what you ask as an admin experience), 
 # policies that direct job's asks based on labels+locality (providing the physical substrate to support (1)).

Note that the general architecture makes (1) and (2) quite feasible. For example, if you look at the policies I just posted in YARN-5324, YARN-5325 it is easy (literally a handful of LOC) to modify the "routing" behavior to be based on node-labels while reusing much of the rest of the mechanics around it. In fact, if you or [~wangda] have time/interest to work on this I am happy to help you orient yourself in what we are doing in the policy space. 


was (Author: curino):
[~vinodkv], Incidentally we were discussing this with [~subru] just yesterday.

*Philosophically:*
I agree with you that node-labels and node-label expressions are very powerful and could subsume much of the rest of yarn locality/sub-clusters etc. 

Another aspect that makes this equivalence somewhat pleasant is that in some reasonably restricted scenario this is quite natural. E.g., given two node-partitions labels (blue, red), at the moment the {{CapacityScheduler}} behaves almost as if the world of blue nodes and red nodes are completely orthogonal to each other. Mapping this onto having two separate RMs dealing with blue nodes and red nodes should be rather straightforward. This is to say that if we simply "paint" each sub-cluster blue or red, it is not to hard to enforce this. Admins could use this concept to manually allocate capacity onto physical sub-clusters by manipulating labels instead of thinking of sub-clusters too explicitly. 

The two concerns of these are:
 # labels are very generic and we risk to confuse the admins as we use the same constructs to refer to very physical or very logical entities (good and bad)
 # Handling richer/more complex intersection of node-label partitions and sub-clusters notions (i.e., where they are not aligned as I described) might get trickier and requires the "digging deeper" you suggested.

All in all, I am in favor of this especially if we also tackle more substantially the scheduler rewrite work we have discussed.

*Practically:*
I think we should land a v0 of federation with all basic mechanisms in place, but with a somewhat limited admin surface that is not fully transparent yet. (i.e., we give users full transparency, but ask a little more to our admins to begin with). This allows us to harden much of the internals and mechanics, before polishing all the tooling around it. Priority-wise this is very important to us.

In v1 (soon after) we will improve this with: 
 # admin tooling that maps the single-logical view of (queue + labels) to multiple subclusters queues + labels transparently (achieving I think what you ask as an admin experience), 
 # policies that direct job's asks based on labels+locality (providing the physical substrate to support (1)).

Note that the general architecture makes (1) and (2) quite feasible. For example, if you look at the policies I just posted in YARN-52324, YARN-5235 it is easy (literally a handful of LOC) to modify the "routing" behavior to be based on node-labels while reusing much of the rest of the mechanics around it. In fact, if you or [~wangda] have time/interest to work on this I am happy to help you orient yourself in what we are doing in the policy space. 

> Enable YARN RM scale out via federation using multiple RM's
> -----------------------------------------------------------
>
>                 Key: YARN-2915
>                 URL: https://issues.apache.org/jira/browse/YARN-2915
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Subru Krishnan
>         Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, Federation-BoF.pdf, Yarn_federation_design_v1.pdf, federation-prototype.patch
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large clusters comprising of tens of thousands of nodes.   That is, rather than limiting a YARN managed cluster to about 4k in size, the proposal is to enable the YARN managed cluster to be elastically scalable.  


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org