hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subru Krishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6128) Add support for AMRMProxy HA
Date Sat, 04 Nov 2017 00:54:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238654#comment-16238654
] 

Subru Krishnan commented on YARN-6128:
--------------------------------------

Thanks [~botong] for your clarification. I have a few follow ups below.

bq. I don't understand what you mean. The current code first call list to find all subclusters
in registry, then read the token for each subcluster in a loop. 

My question is why can't we get the tokens for the sub-clusters also in a single call, to
avoid the read in a loop?

bq. This is to say whether to trust in memory data to decide whether to go into registry and
delete. For FederationInterceptor, the in memory data is always in sync, so it sets ignoreMemoryState
= false. However for registry cleanup (will do inside GPG), GPG will not have the in memory
data, so will pass in true here to force deletion.

Let's move it out of here and add it as part of the GPG patch as it's confusing in it's current
orphaned state.

bq. For store impl of registry, credential might be needed to access store.

I don't see the {{Credentials}} used anywhere in [FSRegistryOperationsService|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/FSRegistryOperationsService.java]
or available in the parent {{RegistryOperations}} interface. So maybe we can add it when we
require it? I am also concerned about its expensive retrieval.

bq. The register is for home RM, after that, to return the full list of containers from previous
attempt, we reattach to all existing UAMs and get running containers in secondary sub-clusters,
merge all of them and return it to AM. In YARN-6704 later, inside FederationInterceptor::recover,
I will be calling reattach as well. The register is for home RM, after that, to return the
full list of containers from previous attempt, we reattach to all existing UAMs and get running
containers in secondary sub-clusters, merge all of them and return it to AM. In YARN-6704
later, inside FederationInterceptor::recover, I will be calling reattach as well. 

Thanks for the clarification but shouldn't we do it only if AM supports recovery and if it's
not the first attempt?

Nit: In the Javadoc for _setKeepContainersAcrossApplicationAttempts_, mention that it's for
UAM recovery and link to API doc.




> Add support for AMRMProxy HA
> ----------------------------
>
>                 Key: YARN-6128
>                 URL: https://issues.apache.org/jira/browse/YARN-6128
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: amrmproxy, nodemanager
>            Reporter: Subru Krishnan
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-6128.v0.patch, YARN-6128.v1.patch, YARN-6128.v1.patch, YARN-6128.v2.patch,
YARN-6128.v3.patch, YARN-6128.v3.patch, YARN-6128.v4.patch, YARN-6128.v5.patch
>
>
> YARN-556 added the ability for RM failover without loosing any running applications.
In a Federated YARN environment, there's additional state in the {{AMRMProxy}} to allow for
spanning across multiple sub-clusters, so we need to enhance {{AMRMProxy}} to support HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message