hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian C. Huffman" <bhuff...@etinternational.com>
Subject MapReduce - FileInputFormat and Locality
Date Wed, 08 May 2013 14:21:33 GMT

I'm trying to understand how the current FileInputFormat implements 
locality.  As far as I can tell, it calculates splits using getSplit and 
each split will contain the node that hosts the first block of data in 
that split.  Is my understanding correct?

Looking at the FileInputFormat for the old API (mapred), it appears that 
it does more to implement locality, using getSplitHosts to "return the 
hosts that contribute most for a given split"

If I understand correctly, why was this changed?


View raw message