hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Where MR meets block locations
Date Wed, 31 Mar 2010 22:37:02 GMT
Moving to mapreduce-dev@ (bcc common-dev@).

Responses inline:

On Mar 29, 2010, at 7:02 PM, Mike Cardosa wrote:
>
> 1) When the jobtracker assigns a task to a tasktracker, it determines
> if the task is data-local or rack-local from the splits (which were
> generated during the job init process). Where in the code could I
> "refresh" the split locations in case they have changed or blocks have
> been replicated to additional new datanodes?
>

No easy way to do that. But in practice, I don't think it matters much.

> 2) When a tasktracker is assigned a map task, is it informed if it's a
> data-local or rack-local map task? If so, where in the code does this
> take place, and is it possible to patch the code to have it check to
> see if it has a data-local copy of the block first before going to the
> network to download the block from another datanode?
>

No, the TT doesn't know/care. The DFSClient in the Map has the smarts  
to do the i/o from the 'nearest' datanode.

Arun
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message