hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: use of inputSplit#getLocations()
Date Thu, 21 Apr 2011 07:37:57 GMT
Hey Johannes,

On Wed, Apr 20, 2011 at 3:37 PM, Johannes Zillmann
<jzillmann@googlemail.com> wrote:
> Should it contain all hosts which contains a replica of any of the blocks, sorted in
a way the the hosts which contributes the most data come first ?
> Or should it contains only those host which were determined as most optimal regarding
the data-locality during the splitting-process.
>
> F.e. in case (a). Should the location array only contain this one host, or should it
contain all hosts but the one host with all the blocks should simply be on the first position
?

Its better to send all locations for maximal locality, but the order
is not considered AFAIK. Its the order of TT heartbeats at the JT that
matters, instead.

-- 
Harsh J

Mime
View raw message