Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66E751028D for ; Thu, 18 Dec 2014 20:48:15 +0000 (UTC) Received: (qmail 53280 invoked by uid 500); 18 Dec 2014 20:48:14 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 53222 invoked by uid 500); 18 Dec 2014 20:48:14 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 53143 invoked by uid 99); 18 Dec 2014 20:48:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 20:48:14 +0000 Date: Thu, 18 Dec 2014 20:48:14 +0000 (UTC) From: "Hadoop QA (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252216#comment-14252216 ] Hadoop QA commented on YARN-2964: --------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688092/YARN-2964.2.patch against trunk revision 07619aa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6149//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6149//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6149//console This message is automatically generated. > RM prematurely cancels tokens for jobs that submit jobs (oozie) > --------------------------------------------------------------- > > Key: YARN-2964 > URL: https://issues.apache.org/jira/browse/YARN-2964 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Daryn Sharp > Assignee: Jian He > Priority: Blocker > Attachments: YARN-2964.1.patch, YARN-2964.2.patch > > > The RM used to globally track the unique set of tokens for all apps. It remembered the first job that was submitted with the token. The first job controlled the cancellation of the token. This prevented completion of sub-jobs from canceling tokens used by the main job. > As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no notion of the first/main job. This results in sub-jobs canceling tokens and failing the main job and other sub-jobs. It also appears to schedule multiple redundant renewals. > The issue is not immediately obvious because the RM will cancel tokens ~10 min (NM livelyness interval) after log aggregation completes. The result is an oozie job, ex. pig, that will launch many sub-jobs over time will fail if any sub-jobs are launched >10 min after any sub-job completes. If all other sub-jobs complete within that 10 min window, then the issue goes unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)