Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3625518C39 for ; Tue, 22 Dec 2015 17:21:47 +0000 (UTC) Received: (qmail 87019 invoked by uid 500); 22 Dec 2015 17:21:47 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 86922 invoked by uid 500); 22 Dec 2015 17:21:46 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 86837 invoked by uid 99); 22 Dec 2015 17:21:46 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 17:21:46 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C4B402C1F55 for ; Tue, 22 Dec 2015 17:21:46 +0000 (UTC) Date: Tue, 22 Dec 2015 17:21:46 +0000 (UTC) From: "Junping Du (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3223) Resource update during NM graceful decommission MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068420#comment-15068420 ] Junping Du commented on YARN-3223: ---------------------------------- Hi [~brookz], thanks for updating the patch. The current approach sounds OK to me. Only one issue here is: there is time window between completedContainer() and RMNodeResourceUpdateEvent get handled. So if a scheduling effort happens within this window, the new container could still get allocated on this node. Even worse case is if scheduling effort happen after RMNodeResourceUpdateEvent sent out but before it propagated to SchedulerNode, then you will find the total resource is lower than used resource and available resource is a negative value. IMO, a safer way is: besides your existing RMNodeResourceUpdateEvent update, in completedContainer() for decommissioning nodes, we can hold on adding back availableResource in SchedulerNode, but continue to deduct usedResource. At this moment, SchedulerNode's total resource will be lower than usedResource + availableResource, but it will soon corrected after RMNodeResourceUpdateEvent comes. How does this sound? > Resource update during NM graceful decommission > ----------------------------------------------- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager > Affects Versions: 2.7.1 > Reporter: Junping Du > Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, YARN-3223-v2.patch, YARN-3223-v3.patch > > > During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)