Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5069119CE for ; Mon, 14 Jul 2014 22:56:08 +0000 (UTC) Received: (qmail 36650 invoked by uid 500); 14 Jul 2014 22:56:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 36600 invoked by uid 500); 14 Jul 2014 22:56:05 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 36588 invoked by uid 99); 14 Jul 2014 22:56:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2014 22:56:05 +0000 Date: Mon, 14 Jul 2014 22:56:05 +0000 (UTC) From: "Karthik Kambatla (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061387#comment-14061387 ] Karthik Kambatla commented on YARN-2244: ---------------------------------------- # Can we use {{AbstractYarnScheduler#killOrphanContainerOnNode()}} instead? {code} + this.rmContext.getDispatcher().getEventHandler() + .handle(new RMNodeCleanContainerEvent(node.getNodeID(), containerId)); {code} # Thanks for moving the following to a separate method. IMO, we should clean it up more: {code} protected void waitForContainerCleanup(DrainDispatcher dispatcher, MockNM nm, NodeHeartbeatResponse resp) throws Exception { int waitCount; dispatcher.await(); List contsToClean = resp.getContainersToCleanup(); int cleanedConts = contsToClean.size(); waitCount = 0; while (cleanedConts < 1 && waitCount++ < 200) { LOG.info("Waiting to get cleanup events.. cleanedConts: " + cleanedConts); Thread.sleep(100); resp = nm.nodeHeartbeat(true); dispatcher.await(); contsToClean = resp.getContainersToCleanup(); cleanedConts += contsToClean.size(); } if (contsToClean.isEmpty()) { LOG.error("Failed to get any containers to cleanup"); } else { LOG.info("Got cleanup for " + contsToClean.get(0)); } Assert.assertEquals(1, cleanedConts); } {code} ## One line over 80 chars ## {{int waitCount = 0}} can go on oneline ## Fetching containers to clean and other arithmetic before the while loop can be moved into the while loop. cleanedConts can be initialized to zero. I am okay with a do-while too. ## Remove the logging - I am not sure why are we logging that information 200 times. ## Parametrize the method to also take number of container cleanups to wait for and use it everywhere. > FairScheduler missing handling of containers for unknown application attempts > ------------------------------------------------------------------------------ > > Key: YARN-2244 > URL: https://issues.apache.org/jira/browse/YARN-2244 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: Anubhav Dhoot > Assignee: Anubhav Dhoot > Priority: Critical > Attachments: YARN-2224.patch, YARN-2244.001.patch, YARN-2244.002.patch > > > We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other fixes that were common across schedulers, there were some scheduler specific fixes added to handle containers for unknown application attempts. Without these fair scheduler simply logs that an unknown container was found and continues to let it run. -- This message was sent by Atlassian JIRA (v6.2#6252)