hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: split locations
Date Fri, 14 Jan 2011 17:33:50 GMT
On Fri, Jan 14, 2011 at 3:09 AM, Pedro Costa <psdc1978@gmail.com> wrote:

> Hi,
> If a split location contains more that one location, it means that
> this split file is replicated through all locations, or it means that
> a split is divided into several blocks, and each block is in one
> location?

It requests that the map runs on one of those machines or on the same rack
as one of those machines. Currently there is no way to weight if one machine
in the list is "better" than another. If an input split covers multiple
blocks, the InputFormat is best served by picking the top N machines that
are close a copy of most of the data, where N is roughly 3 to 5.

-- Owen

View raw message