hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "JobTracker" by SteveLoughran
Date Tue, 05 Aug 2008 09:04:54 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:

The comment on the change is:
creating a page

New page:
The JobTracker is the service within Hadoop that farms out MapReduce tasks to specific nodes
in the cluster, ideally the nodes that have the data, or at least are in the same rack.

 1. Client applications submit jobs to the Job tracker.

 1. The JobTracker talks to the NameNode to determine the location of the data

 1. The JobTracker locates TaskTracker nodes with available slots at or near the data

 1. The JobTracker submits the work to the chosen TaskTracker nodes.

 1. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough,
they are deemed to have failed and the work is scheduled on a different TaskTracker.

 1. The TaskTrackers notify the JobTracker when a task fails. The JobTracker decides what
to do then: it may resubmit the job elsewhere, it may mark that specific record as something
to avoid, and it may may even blacklist the TaskTracker as unreliable.

 1. When the work is completed, the JobTracker updates its status.

 1. Client applications can poll the JobTracker for information.

The JobTracker is a point of failure for the Map/Reduce infrastructure. If it goes down, all
running jobs are lost. The fileystem remains live.

View raw message