Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C52018C9D for ; Mon, 14 Dec 2015 22:57:47 +0000 (UTC) Received: (qmail 61836 invoked by uid 500); 14 Dec 2015 22:57:47 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 61705 invoked by uid 500); 14 Dec 2015 22:57:47 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 61528 invoked by uid 99); 14 Dec 2015 22:57:46 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Dec 2015 22:57:46 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id AEA622C1F71 for ; Mon, 14 Dec 2015 22:57:46 +0000 (UTC) Date: Mon, 14 Dec 2015 22:57:46 +0000 (UTC) From: "zhihai xu (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056895#comment-15056895 ] zhihai xu commented on YARN-4440: --------------------------------- Good catch! thanks for working on this issue [~linyiqun]! +1 for the latest patch, The test failures are not related to the patch, These failures were already reported at YARN-4318 and YARN-4306. Will commit it tomorrow if no one objects. > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > ----------------------------------------------------------------------------- > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.1 > Reporter: Lin Yiqun > Assignee: Lin Yiqun > Attachments: YARN-4440.001.patch, YARN-4440.002.patch, YARN-4440.003.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in lastScheduledContainer and this will lead to execute these code for next invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more than the delay time and allowedLocality degrade. Because FsApp startTime will be start earlier than currentTimeMs. So we should add the initial time of priority to prevent comparing with FsApp startTime and allowedLocalityLevel degrade. And this problem will have more negative influence for small-jobs. The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)