hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leitao Guo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-225) Fault tolerant Hadoop Job Tracker
Date Tue, 01 Mar 2011 06:41:37 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000741#comment-13000741

Leitao Guo commented on MAPREDUCE-225:

Thanks for your response, Arun. 

Although HADOOP-1876 and HADOOP-3245 do not work well, I think the failover for JobTracker
is still considerable.

If JobTracker on one server down, we need to restart JobTracker or migrate JobTracker to another
server. In our scenario, we may not care about whether the job will continue just from the
same progress before JobTracker failed, but the automatic failover is needed.  Integrating
zookeeper with JobTracker is a workable solution for failover I think.

> Fault tolerant Hadoop Job Tracker
> ---------------------------------
>                 Key: MAPREDUCE-225
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-225
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>         Environment: High availability enterprise system
>            Reporter: Francesco Salbaroli
>            Assignee: Francesco Salbaroli
>         Attachments: Enhancing the Hadoop MapReduce framework by adding fault.ppt, FaultTolerantHadoop.pdf,
HADOOP-4586-0.1.patch, HADOOP-4586v0.3.patch, jgroups-all.jar
> The Hadoop framework has been designed, in an eort to enhance perfor-
> mances, with a single JobTracker (master node). It's responsibilities varies
> from managing job submission process, compute the input splits, schedule
> the tasks to the slave nodes (TaskTrackers) and monitor their health.
> In some environments, like the IBM and Google's Internet-scale com-
> puting initiative, there is the need for high-availability, and performances
> becomes a secondary issue. In this environments, having a system with
> a Single Point of Failure (such as Hadoop's single JobTracker) is a major
> concern.
> My proposal is to provide a redundant version of Hadoop by adding
> support for multiple replicated JobTrackers. This design can be approached
> in many dierent ways. 
> In the document at: http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0
> I wrote an overview of the problem and some approaches to solve it.
> I post this to the community to gather feedback on the best way to proceed in my work.
> Thank you!

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message