hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Salbaroli (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4586) Fault tolerant Hadoop Job Tracker
Date Thu, 06 Nov 2008 11:33:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645466#action_12645466

Francesco Salbaroli commented on HADOOP-4586:

Thank you for the comment Steve.

1) You're right, but I don't have exact details about the infrastructure on which the Hadoop
cluster will run (probably here at IBM will be possible to run Hadoop on top of IBM BlueCloud
cloud computing infrastructure). So in an environment with fault-detection and fast, automatic
VM provisioning, cold standby can be an option. This point need further investigation and
I hope to receive feedback again on this.

2) In Hot-Standby I supposed a very small number of replicas of the JobTracker (between 2
and 4), so the high network traffic shouldn't be a major concern. But this can be verified
only after extensive test sessions.

3) Yes, DNS update seems to be the best option.

4) I wasn't aware of the limitation on multicast of Amazon EC2. Two possible solutions: define
statically the JobTracker nodes or maintain a list of nodes on DFS or shared cache

5) I thought about a heartbeat mechanism.

6) Network partitioning is an issue. Ignoring HDFS, using an election protocol will produce
separate smaller fully functional cluster (that is an unwanted feature) but, it should be
able to detect multiple running masters and re-run the election to reach another stable state.

7)This will be part of my M.Sc. but I produced this document only to propose my ideas to the
community and gather feedback. And, yes, this is an early draft.

8) I will take a look at it

So, thank you again for the interest you show in my work and I hope to hear from you and other
community members soon.


> Fault tolerant Hadoop Job Tracker
> ---------------------------------
>                 Key: HADOOP-4586
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4586
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.18.0
>         Environment: High availability enterprise system
>            Reporter: Francesco Salbaroli
>         Attachments: FaultTolerantHadoop.pdf
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
> The Hadoop framework has been designed, in an eort to enhance perfor-
> mances, with a single JobTracker (master node). It's responsibilities varies
> from managing job submission process, compute the input splits, schedule
> the tasks to the slave nodes (TaskTrackers) and monitor their health.
> In some environments, like the IBM and Google's Internet-scale com-
> puting initiative, there is the need for high-availability, and performances
> becomes a secondary issue. In this environments, having a system with
> a Single Point of Failure (such as Hadoop's single JobTracker) is a major
> concern.
> My proposal is to provide a redundant version of Hadoop by adding
> support for multiple replicated JobTrackers. This design can be approached
> in many dierent ways. 
> In the document at: http://sites.google.com/site/hadoopthesis/Home/FaultTolerantHadoop.pdf?attredirects=0
> I wrote an overview of the problem and some approaches to solve it.
> I post this to the community to gather feedback on the best way to proceed in my work.
> Thank you!

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message