hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: MapReduce - FileInputFormat and Locality
Date Wed, 08 May 2013 22:00:13 GMT
I think you misread it.

If a given split has only one block, it uses all the locations of that block.

If it so happens that a given split has multiple blocks, it uses all the locations of the
first block.

+Vinod Kumar Vavilapalli
Hortonworks Inc.

On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote:

> All,
> I'm trying to understand how the current FileInputFormat implements locality.  As far
as I can tell, it calculates splits using getSplit and each split will contain the node that
hosts the first block of data in that split.  Is my understanding correct?
> Looking at the FileInputFormat for the old API (mapred), it appears that it does more
to implement locality, using getSplitHosts to "return the hosts that contribute most for a
given split"
> If I understand correctly, why was this changed?
> Thanks,
> Brian

View raw message