hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: What are uses of taskTracker and JobTracker services?
Date Wed, 30 Jun 2010 10:02:51 GMT
Hemanth Yamijala wrote:
> Hi,
>> I think that he was trying to explain that in HDFS, you have a name node and then
your data nodes.
>> So you have the name node service on the name node and each data node has a data
node service.
>> When you run a map reduce job, you have a Job tracker that resides on the name node
and controls the overall job.
> May or may not be true. In general, for moderately complex cases, it
> is best to run the name node and jobtracker on different nodes so both
> masters don't fail where only one of them can.

More for scale than availability, was my belief; if the NN goes offline, 
your JT locks up until it comes back anyway

>> On each data node, where the jobs run in parallel, there exists a task tracker.
> This is almost always true, of course - it helps Hadoop to achieve
> data locality by colocating where the task runs with where it has to
> read data from.

If you have machines in the room which can come and go without warning 
-doing Hadoop work with spare cycles- then you can make them 
task-tracker only, so you don't store persistent data there, just temp 
files and when the machines get switched to other work you don't lose 
HDFS data. But you do increase network traffic...

View raw message