Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2940B9BC6 for ; Sun, 17 Jun 2012 05:22:50 +0000 (UTC) Received: (qmail 96919 invoked by uid 500); 17 Jun 2012 05:22:49 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 95129 invoked by uid 500); 17 Jun 2012 05:22:45 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 95044 invoked by uid 99); 17 Jun 2012 05:22:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Jun 2012 05:22:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Jun 2012 05:22:36 +0000 Received: by pbbrq8 with SMTP id rq8so8410252pbb.35 for ; Sat, 16 Jun 2012 22:22:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=0CUaRYOdkBg3h0b0nyX/JlPWcCffip8XBAT7bDg0r3c=; b=a/wwTsvRYp3wW+z2q+Oh2ywAZKbRM++f3a+hFM96IAUN3k0SOd8441yK+GnL8VinCB q1azfaRvM+Z1VmHuhJDZaQEOKAmEZEwNXDqoZIC3xD9FaQKGAubt4WOL0GucHOzB2fAO GY+V1FzQZXHhFJ9pY/mOdYWrgCbVJfQzFPtCXtofhIGlkTlQaEzLCsqsjKmRkvVRYppu 9kdEoM9Xypv+CGVcoov2LOAn/CYSvRbrxDpQ5OY6aO8b/1EhgvWcHJeKVBjycHNc3StV nhTiqHJYxySKwVdjpoKwMGYdaAGz7bZ09QhDZqDU7YmSQLNyP0SmObRn5tDikGAstYiT omOA== Received: by 10.68.225.101 with SMTP id rj5mr37578678pbc.103.1339910536235; Sat, 16 Jun 2012 22:22:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.130.201 with HTTP; Sat, 16 Jun 2012 22:21:56 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Sun, 17 Jun 2012 10:51:56 +0530 Message-ID: Subject: Re: HA MRv1 JobTracker? To: mapreduce-user@hadoop.apache.org Cc: mapreduce-dev@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlpcoByTtJYY07TEPwF44bcBqeM+Jyvpl3PxVTgTcdydLXHVFg5g365tc/J1ei/mkXekWdY X-Virus-Checked: Checked by ClamAV on apache.org Hey Andrew, I know not the answers to all your questions, but the https://issues.apache.org/jira/browse/MAPREDUCE-2288 JIRA serves as a good umbrella we can use to track this overall (there seems to have been multiple approaches presented over time). The closest I found to your rumor note was https://issues.apache.org/jira/browse/MAPREDUCE-2648, but it lacks job state maintenance (i.e. provides no resuming of jobs post failover). I did not dig too deep, however. On Sun, Jun 17, 2012 at 3:53 AM, Andrew Purtell wrote= : > We are planning to run a next generation of Hadoop ecosystem components i= n > our production in a few months. We plan to use HDFS 2.0 for the HA NameNo= de > work. The platform will also include YARN but its use will be experimenta= l. > So we'll be running something equivalent to the CDH MR1 package to suppor= t > production workloads for I'd guess a year. > > We have heard a rumor regarding the existence of a version of the MR1 > Jobtracker that persists state to Zookeeper such that failover to a new > instance is fast and doesn't lose job state. I'd like to be aspirational = and > aim for a HA MR1 Jobtracker to complement the HA namenode. Even if no suc= h > existing code is available, we might adapt existing classes in the MR1 > Jobtracker to models/proxies of state in zookeeper. For clusters of our s= ize > (in the 100s of nodes range) this could be workable. Also, the MR client > could possibly use ZK for failover like the HDFS client. > > I'm trying to find out first the availability of such code if anyone know= s. > Otherwise, we may try building this, and so also I'd like to get a sense = of > any interest in usage or dev collaboration. > > Best regards, > > =A0 =A0 - Andy > > > > > -- > Best regards, > > =A0 =A0- Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > --=20 Harsh J