Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 99DBE18850 for ; Mon, 18 Jan 2016 13:17:40 +0000 (UTC) Received: (qmail 49443 invoked by uid 500); 18 Jan 2016 13:17:40 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 49382 invoked by uid 500); 18 Jan 2016 13:17:40 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 49316 invoked by uid 99); 18 Jan 2016 13:17:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jan 2016 13:17:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 325192C1F62 for ; Mon, 18 Jan 2016 13:17:40 +0000 (UTC) Date: Mon, 18 Jan 2016 13:17:40 +0000 (UTC) From: "Junping Du (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105267#comment-15105267 ] Junping Du commented on MAPREDUCE-6608: --------------------------------------- Thanks [~srikanth.sampath] and [~raju.bairishetti] for proposing this JIRA and upload a design document. This work could be a significant improvement to our MapReduce framework reliability. Go through the current design doc, I think store new attempt address for MR AM in zookeeper could have scalability issues in case MR job has massive running tasks (ten thousands or more). I think it could be better to store/get new MR AM location from HDFS which has better scalability. Also, in my understanding, Yarn Service Registry may not best fit for this case. CC [~stevel@apache.org] who is author of YSR. I could propose another version of design with more details in next few days in case we haven't started the development work yet. > Work Preserving AM Restart for MapReduce > ---------------------------------------- > > Key: MAPREDUCE-6608 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Srikanth Sampath > Assignee: Raju Bairishetti > Attachments: WorkPreservingMRAppMaster.pdf > > > Providing a framework for work preserving AM is achieved in [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489]. We would like to take advantage of this for MapReduce(MR) applications. There are some challenges which have been described in the attached document and few options discussed. We solicit feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)