hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari A V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-225) Fault tolerant Hadoop Job Tracker
Date Wed, 08 Jun 2011 09:12:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045849#comment-13045849

Hari A V commented on MAPREDUCE-225:


Sorry for a very late response. 
@Arun: Yes MAPREDUCE-225 is a completely new architecture. May be still need to wait for longer
time to get it done. For those who uses 0.20 version and need a simple "availability solution",
a much simpler approach would be helpful
@Leitao: Yes, its similar to HMaster HA. It works. I have finished the development of ZK based
framework and integrated with JT. I am in the process of contributing it back. As a first
step, i have opened a Jira in Zookeeper for a generic LeaderElectionService (ZOOKEEPER-1080).
I will upload the patch soon.

ZK+JT may not be a full fledged HA solution. But what it tries to address is 
1. Avoid manual intervention during a Jobtracker failure.
2. Recover and Continue the jobs ( even re-submitting the jobs) without notifying to clients
who submitted the job. 

Solution remains very simple as no need to synchronize the "state of the jobs". 

Job may take longer time to finish during failover due to re-submission of jobs

Please provide suggestions


> Fault tolerant Hadoop Job Tracker
> ---------------------------------
>                 Key: MAPREDUCE-225
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-225
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>         Environment: High availability enterprise system
>            Reporter: Francesco Salbaroli
>            Assignee: Francesco Salbaroli
>         Attachments: Enhancing the Hadoop MapReduce framework by adding fault.ppt, FaultTolerantHadoop.pdf,
HADOOP-4586-0.1.patch, HADOOP-4586v0.3.patch, jgroups-all.jar
> The Hadoop framework has been designed, in an eort to enhance perfor-
> mances, with a single JobTracker (master node). It's responsibilities varies
> from managing job submission process, compute the input splits, schedule
> the tasks to the slave nodes (TaskTrackers) and monitor their health.
> In some environments, like the IBM and Google's Internet-scale com-
> puting initiative, there is the need for high-availability, and performances
> becomes a secondary issue. In this environments, having a system with
> a Single Point of Failure (such as Hadoop's single JobTracker) is a major
> concern.
> My proposal is to provide a redundant version of Hadoop by adding
> support for multiple replicated JobTrackers. This design can be approached
> in many dierent ways. 
> In the document at: http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0
> I wrote an overview of the problem and some approaches to solve it.
> I post this to the community to gather feedback on the best way to proceed in my work.
> Thank you!

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message