From issues-return-173849-archive-asf-public=cust-asf.ponee.io@flink.apache.org  Fri Jun 29 04:13:37 2018
Return-Path: <issues-return-173849-archive-asf-public=cust-asf.ponee.io@flink.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 8B92A180662
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 29 Jun 2018 04:13:37 +0200 (CEST)
Received: (qmail 9788 invoked by uid 500); 29 Jun 2018 02:13:36 -0000
Mailing-List: contact issues-help@flink.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@flink.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@flink.apache.org>
List-Post: <mailto:issues@flink.apache.org>
List-Id: <issues.flink.apache.org>
Reply-To: dev@flink.apache.org
Delivered-To: mailing list issues@flink.apache.org
Received: (qmail 9779 invoked by uid 99); 29 Jun 2018 02:13:36 -0000
Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jun 2018 02:13:36 +0000
Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33)
	id 74777E1072; Fri, 29 Jun 2018 02:13:36 +0000 (UTC)
From: Clarkkkkk <git@git.apache.org>
To: issues@flink.apache.org
Reply-To: issues@flink.apache.org
References: <git-pr-6192-flink@git.apache.org>
In-Reply-To: <git-pr-6192-flink@git.apache.org>
Subject: [GitHub] flink pull request #6192: [FLINK-9567][runtime][yarn] Fix the bug that Flink...
Content-Type: text/plain
Message-Id: <20180629021336.74777E1072@git1-us-west.apache.org>
Date: Fri, 29 Jun 2018 02:13:36 +0000 (UTC)

Github user Clarkkkkk commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6192#discussion_r199036840
  
    --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java ---
    @@ -334,8 +335,11 @@ public void onContainersCompleted(final List<ContainerStatus> list) {
     					if (yarnWorkerNode != null) {
     						// Container completed unexpectedly ~> start a new one
     						final Container container = yarnWorkerNode.getContainer();
    -						requestYarnContainer(container.getResource(), yarnWorkerNode.getContainer().getPriority());
    -						closeTaskManagerConnection(resourceId, new Exception(containerStatus.getDiagnostics()));
    +						// check WorkerRegistration status to avoid requesting containers more than required
    +						if (checkWorkerRegistrationWithResourceId(resourceId)) {
    --- End diff --
    
    Yes, I might happen. The problem is not as easy as I thought. The actual cause of this problem is the resource was released before a full restart but the onContainerCompleted callback method happened after the full restart. As the full restart will requesting all the containers needed as configured, if the onContainerCompleted  method was called after that, it will request for a new container and possess it which is not needed.


---