Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E6B98200B26 for ; Mon, 27 Jun 2016 13:18:53 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E5578160A3C; Mon, 27 Jun 2016 11:18:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4196E160A62 for ; Mon, 27 Jun 2016 13:18:53 +0200 (CEST) Received: (qmail 7616 invoked by uid 500); 27 Jun 2016 11:18:52 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 7487 invoked by uid 99); 27 Jun 2016 11:18:52 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jun 2016 11:18:52 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 28BD92C1F60 for ; Mon, 27 Jun 2016 11:18:52 +0000 (UTC) Date: Mon, 27 Jun 2016 11:18:52 +0000 (UTC) From: "Rohith Sharma K S (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-5279) Potential Container leak in NM in preemption flow MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 27 Jun 2016 11:18:54 -0000 [ https://issues.apache.org/jira/browse/YARN-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-5279: ------------------------------------ Attachment: 0001-YARN-5279.patch Updated the patch for informing RMNodeImple that untracked containers should be removed from corresponding NodeManager. In a given patch, I reused the event type {{RMNodeEventType.FINISHED_CONTAINERS_PULLED_BY_AM}} from scheduler. > Potential Container leak in NM in preemption flow > ------------------------------------------------- > > Key: YARN-5279 > URL: https://issues.apache.org/jira/browse/YARN-5279 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Attachments: 0001-YARN-5279.patch > > > In discussion YARN-4862 [comment|https://issues.apache.org/jira/browse/YARN-4862?focusedCommentId=15341538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15341538], it is observed that there could be a container leak in NodeManager whenever container is preempted from RM > Basically if NM receives same containerId details in {{containersToCleanUp}} and {{containersToBeRemovedFromNM}} in the same heartbeat then container will never-ever removed in NMContext. Rather NM kills the container of containersToCleanup and send back status again to RM. But RM blindly reject the status since RMContainer is already removed and it is null. > I think whenever RMContainer is null, RMNode should be informed to send {{containersToBeRemovedFromNM}} so that NM will remove from its context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org