hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
Date Fri, 22 Jan 2016 20:52:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113041#comment-15113041
] 

Carlo Curino commented on YARN-2885:
------------------------------------

[~asuresh], I skimmed the patch briefly, focusing on a couple of issues: 1) visibility/security
of the extra information about the cluster state, 2) the LocalScheduler algos.

Sorry If I ask stupid questions, I haven't been following closely and looking at this code
for super-long.

I like the idea of the {{DistributedSchedulingProtocol}} as a specialization of the {{ApplicationMasterProtocol}}.
One thing which would make it even stronger is to enforce
the visibility/access to the extra information about cluster state, by means of tokens. This
would allow you to say, every application in the cluster has the AMRM token, but
only the AMRRMProxy can add a special "DSP-Token" that grants visibility of the cluster state
(being top-k or whatever extra info the DSP sends down the pipe).
Moreover, this would allow trusted and smart applications to also receive this information
if the RM decide to grant them this privilege. This could be great for any AM that
has smarts that could determine where they want to run based on cluster load etc. 

(I am ok if this is done in an follow up JIRA, especially given you guys are working on a
branch)

I started to look at the LocalScheduler code. I think I need some more comments to follow
along. 

Minor in LocalScheduler (and surrounding classes):
 * The {{DistSchedulerParams}} hard-codes assumptions on the fact that resources are only
mem/cpu, as work is ongoing to make that more general, I suggest to use Resource construct

 * in updateResourceAsk() it is a bit confusing the use of "requeusts" as name for both input
param and global variable. Can you change that? also having updateResourceAsk not 
   have side-effect but return a list might help.
 *  In {{OpportunisticContainerAllocator}} Why are you "resizing" containers? If the app is
asking for an unadmissible container, I don't think it is correct to lower its ask to the
largest acceptible container (Maybe rephrasing it as a question: Is this what the RM does?).

   Also this math is doen on mem and cpu as integers, instead of on Resources (see above).
 * You use HashMap<Resource, ResourceRequest> but I don't see the reason for it, as
you seem to scan the entire set anyway.
 

> Create AMRMProxy request interceptor for distributed scheduling decisions for queueable
containers
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2885
>                 URL: https://issues.apache.org/jira/browse/YARN-2885
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Arun Suresh
>         Attachments: YARN-2885-yarn-2877.001.patch, YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch,
YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, YARN-2885-yarn-2877.v4.patch,
YARN-2885-yarn-2877.v5.patch, YARN-2885-yarn-2877.v6.patch, YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to support distributed
scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message