hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ricky l <rickylee0...@gmail.com>
Subject Re: question about preserving data locality in MapReduce with Yarn
Date Tue, 29 Oct 2013 03:10:28 GMT
Hi Sandy, thank you very much for the information. It is good to know that
MapReduce AM considers the block location information. BTW, I am not very
familiar with the concept of splits. Is it specific to MR jobs? If
possible, code location would be very helpful for reference as I am trying
to implement an application master that needs to consider HDFS
data-locality. thx.


On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:

> Hi Ricky,
> The input splits contain the locations of the blocks they cover.  The AM
> gets the information from the input splits and submits requests for those
> location.  Each container request spans all the replicas that the block is
> located on.  Are you interested in something more specific?
> -Sandy
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <rickylee0815@gmail.com> wrote:
>> Well, I thought an application master can somewhat ask where the data
>> exist to a namenode.... isn't it true? If it does not know where the data
>> reside, does a MapReduce application master specify the resource name as
>> "*" which means data locality might not be preserved at all? thx,
>> r

View raw message