Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB5BADEBC for ; Fri, 30 Nov 2012 11:50:01 +0000 (UTC) Received: (qmail 73426 invoked by uid 500); 30 Nov 2012 11:50:01 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 73369 invoked by uid 500); 30 Nov 2012 11:50:01 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 73204 invoked by uid 99); 30 Nov 2012 11:50:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Nov 2012 11:49:59 +0000 Date: Fri, 30 Nov 2012 11:49:59 +0000 (UTC) From: "Sharad Agarwal (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <256970107.44868.1354276199993.JavaMail.jiratomcat@arcas> In-Reply-To: <326295533.41888.1354220218798.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (MAPREDUCE-4832) MR AM can get in a split brain situation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507281#comment-13507281 ] Sharad Agarwal commented on MAPREDUCE-4832: ------------------------------------------- MAPREDUCE-2702 introduced a job level commit. Task commit from competing AMs should not overstep each other. Job level commit should ensure only the first one succeeds. Are you observing this in the cluster ? > MR AM can get in a split brain situation > ---------------------------------------- > > Key: MAPREDUCE-4832 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Affects Versions: 2.0.2-alpha, 0.23.5 > Reporter: Robert Joseph Evans > Priority: Critical > > It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running. If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs. This could cause lots of problems where the second AM finishes and tries to commit too. This could result in data corruption. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira