Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91719DCE1 for ; Fri, 31 Aug 2012 20:58:08 +0000 (UTC) Received: (qmail 50909 invoked by uid 500); 31 Aug 2012 20:58:08 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 50867 invoked by uid 500); 31 Aug 2012 20:58:08 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 50847 invoked by uid 99); 31 Aug 2012 20:58:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Aug 2012 20:58:08 +0000 Date: Sat, 1 Sep 2012 07:58:08 +1100 (NCT) From: "Hudson (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <615750317.24841.1346446688294.JavaMail.jiratomcat@arcas> In-Reply-To: <1440267356.16712.1346337607941.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446368#comment-13446368 ] Hudson commented on MAPREDUCE-4611: ----------------------------------- Integrated in Hadoop-Common-trunk-Commit #2666 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2666/]) MAPREDUCE-4611. MR AM dies badly when Node is decommissioned (Robert Evans via tgraves) (Revision 1379599) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1379599 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java > MR AM dies badly when Node is decomissioned > ------------------------------------------- > > Key: MAPREDUCE-4611 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4611 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 > Reporter: Robert Joseph Evans > Assignee: Robert Joseph Evans > Priority: Critical > Fix For: 0.23.3, 3.0.0, 2.2.0-alpha > > Attachments: MR-4611.txt > > > The MR AM always thinks that it is being killed by the RM when it gets a kill signal and it has not finished processing yet. In reality the RM kill signal is only sent when the client cannot communicate directly with the AM, which probably means that the AM is in a bad state already. The much more common case is that the node is marked as unhealthy or decomissioned. > I propose that in the short term the AM will only clean up if > # The process has been asked by the client to exit (kill) > # The process job has finished cleanly and is exiting already > # This is that last retry of the AM retries. > The downside here is that the .staging directory will be leaked and the job will not show up in the history server on an kill from the RM in some cases. > At least until the full set of AM cleanup issues can be addressed, probably as part of MAPREDUCE-4428 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira