hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7899) [AMRMProxy] Stateful FederationInterceptor for pending requests
Date Mon, 09 Jul 2018 20:14:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537504#comment-16537504

Hudson commented on YARN-7899:

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14545 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14545/])
YARN-7899. [AMRMProxy] Stateful FederationInterceptor for pending (gifuma: rev ea9b608237e7f2cf9b1e36b0f78c9674ec84096f)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMRMClientRelayer.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/AMRMClientUtils.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedApplicationManager.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java

> [AMRMProxy] Stateful FederationInterceptor for pending requests
> ---------------------------------------------------------------
>                 Key: YARN-7899
>                 URL: https://issues.apache.org/jira/browse/YARN-7899
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Major
>              Labels: amrmproxy, federation
>             Fix For: 3.2.0
>         Attachments: YARN-7899.v1.patch, YARN-7899.v3.patch
> Today FederationInterceptor (in AMRMProxy for YARN Federation) is stateless in terms
of pending (outstanding) requests. Whenever AM issues new requests, FI simply splits and sends
them to sub-cluster YarnRMs and forget about them. This JIRA attempts to make FI stateful
so that it remembers the pending requests in all relevant sub-clusters. This has two major
> 1. It is a prerequisite for FI to be able to cancel pending request in one sub-cluster
and re-send it to other sub-clusters. This is needed for load balancing and to fully comply
with the relax locality fallback to ANY semantic. When we send a request to one sub-cluster,
we have effectively restrained the allocation for this request to be within this sub-cluster
rather than everywhere. If the cluster capacity in this sub-cluster for this app is full or
this YarnRM is overloaded and slow, the request will be stuck there for a long time even if
there is free capacity in other sub-clusters. We need FI to remember and adjust the pending
requests on the fly. 
> 2. This makes pending request recovery easier when YarnRM fails over. Today whenever
one sub-cluster RM fails over, in order to recover lost pending requests for this sub-cluster,

> we have to propagate the ApplicationMasterNotRegisteredException from the YarnRM back
to AM, triggering a full pending resend from AM. This contains pending for not only the failing-over
sub-cluster, but everyone. Since our split-merge (AMRMProxyPolicy) does not guarantee idempotency,
the same request we sent to sub-cluster-1 earlier might be resent to sub-cluster-2. If both
these YarnRMs have not failed over, they will both allocate for this request, leading to over-allocation.
Also, these full pending resends also puts unnecessary load on every YarnRM in the cluster
everytime one YarnRM fails over. With stateful FederationInterceptor, since we remember pending
requests we have sent out earlier, we can shield the ApplicationMasterNotRegisteredException
for AM and resend the pending only to the failed over YarnRM. This eliminates over-allocation
and minimizes the recovery overhead. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message