Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates
 209.85.160.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+RK=_CJDOG2m-VKNUABH=+YrsFUtDVrQVU+XLf-KTWdAVsvRA@mail.gmail.com>
References: 
 <CA+RK=_CJDOG2m-VKNUABH=+YrsFUtDVrQVU+XLf-KTWdAVsvRA@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Sun, 17 Jun 2012 10:51:56 +0530
Message-ID: 
 <CAOcnVr1R5uzGhm42CHF_bMyQgynROnRKhgLi_gMzSnfGHU3FpA@mail.gmail.com>
Subject: Re: HA MRv1 JobTracker?
To: mapreduce-user@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hey Andrew,

I know not the answers to all your questions, but the
https://issues.apache.org/jira/browse/MAPREDUCE-2288 JIRA serves as a
good umbrella we can use to track this overall (there seems to have
been multiple approaches presented over time).

The closest I found to your rumor note was
https://issues.apache.org/jira/browse/MAPREDUCE-2648, but it lacks job
state maintenance (i.e. provides no resuming of jobs post failover). I
did not dig too deep, however.

On Sun, Jun 17, 2012 at 3:53 AM, Andrew Purtell <apurtell@apache.org> wrote=
:
> We are planning to run a next generation of Hadoop ecosystem components i=
n
> our production in a few months. We plan to use HDFS 2.0 for the HA NameNo=
de
> work. The platform will also include YARN but its use will be experimenta=
l.
> So we'll be running something equivalent to the CDH MR1 package to suppor=
t
> production workloads for I'd guess a year.
>
> We have heard a rumor regarding the existence of a version of the MR1
> Jobtracker that persists state to Zookeeper such that failover to a new
> instance is fast and doesn't lose job state. I'd like to be aspirational =
and
> aim for a HA MR1 Jobtracker to complement the HA namenode. Even if no suc=
h
> existing code is available, we might adapt existing classes in the MR1
> Jobtracker to models/proxies of state in zookeeper. For clusters of our s=
ize
> (in the 100s of nodes range) this could be workable. Also, the MR client
> could possibly use ZK for failover like the HDFS client.
>
> I'm trying to find out first the availability of such code if anyone know=
s.
> Otherwise, we may try building this, and so also I'd like to get a sense =
of
> any interest in usage or dev collaboration.
>
> Best regards,
>
> =A0 =A0 - Andy
>
>
>
>
> --
> Best regards,
>
> =A0 =A0- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


--=20
Harsh J