hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Botong Huang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6511) Federation Intercepting and propagating AM-RM communications (part two: secondary subclusters added)
Date Tue, 06 Jun 2017 17:54:18 GMT

     [ https://issues.apache.org/jira/browse/YARN-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Botong Huang updated YARN-6511:
    Attachment: YARN-6511-YARN-2915.v3.patch

Thanks [~subru] for the review! I've addressed most comments in v3 patch (as well as ones
from [~jianhe]). For the rest, please see below: 

bq. Do we need a {{UnmanagedAMPoolManager}} per interceptor instance or can we use one at
{{AMRMProxyService}} level?
It is easier the current way because we need to constantly get all UAM associated with one
application (keyed by subClusterId). 
If we do one pool per AMRMProxy, then we probably need to key UAM with appId+subclusterId.
The search for UAMs associated with one application will not be straight forward. 

bq. Is updating the queue below safe in *loadAMRMPolicy*
Yes, the variable _queue_ is a local string, used by only the policy manager.

bq. I feel the *finishApplicationMaster* of the pool should be moved to {{UnmanagedAMPoolManager}}.

Yes we can choose to. However it will likely be a blocking call then, where we loose the freedom
to schedule the tasks, synchronously call finish in home, and then wait for the secondaries
to come back. Or, we need addition interface in UAMPoolManager, one for schedule and one for
fetch result. I've added a TODO for this. 

bq. I see dynamic instantiations of {{ExecutorCompletionService}} in finish, register, etc
invocations. Wouldn't we be better served by pre-initializing it?
We need to create them locally because of concurrency. The allocate and finish calls can be
invoked concurrently. Sharing the same completion service object will confuse the tasks submitted
from both sides.

bq. Is *getSubClusterForNode* required as the resolver should be doing this instead of every
_AbstractSubClusterResolver.getSubClusterForNode_ throws when resolving an unknown node, we
don't want to throw in this case, and thus need to catch and log the warning.

bq. Move _YarnConfiguration_ outside the for loop in *registerWithNewSubClusters*
We cannot because we need a different config per UAM, loaded with the sub-cluster id

bq. Consider looping on _registrations_ in lieu of _requests_ in *sendRequestsToSecondaryResourceManagers*
Registration only contains the newly added secondary sub-clusters, while we need to loop over
(send heartbeat to) all known secondaries here.

> Federation Intercepting and propagating AM-RM communications (part two: secondary subclusters
> ----------------------------------------------------------------------------------------------------
>                 Key: YARN-6511
>                 URL: https://issues.apache.org/jira/browse/YARN-6511
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>         Attachments: YARN-6511-YARN-2915.v1.patch, YARN-6511-YARN-2915.v2.patch, YARN-6511-YARN-2915.v3.patch
> In order to support transparent "spanning" of jobs across sub-clusters, all AM-RM communications
are proxied (via YARN-2884).
> This JIRA tracks federation-specific mechanisms that decide how to "split/broadcast"
requests to the RMs and "merge" answers to 
> the AM.
> This the part two jira, which adds secondary subclusters and do full split-merge for
requests. Part one is in YARN-3666

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message