hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Cardosa <card...@gmail.com>
Subject Where MR meets block locations
Date Tue, 30 Mar 2010 02:02:30 GMT
I am interested in a few things, all pertaining to hdfs block
locations for running map tasks. I have spent several days looking
through the hadoop source code and have arrived at a couple of
questions that are still plaguing me.

1) When the jobtracker assigns a task to a tasktracker, it determines
if the task is data-local or rack-local from the splits (which were
generated during the job init process). Where in the code could I
"refresh" the split locations in case they have changed or blocks have
been replicated to additional new datanodes?

2) When a tasktracker is assigned a map task, is it informed if it's a
data-local or rack-local map task? If so, where in the code does this
take place, and is it possible to patch the code to have it check to
see if it has a data-local copy of the block first before going to the
network to download the block from another datanode?

Thanks for your time in advance.
Mike Cardosa

View raw message