hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: question about preserving data locality in MapReduce with Yarn
Date Thu, 31 Oct 2013 22:59:51 GMT
Splits are a MapReduce concept . Check out FileInputFormat for how an
example of how to get block locations.  You can then pass these locations
into an AMRMClient.ContainerRequest.

-Sandy


On Mon, Oct 28, 2013 at 8:10 PM, ricky l <rickylee0815@gmail.com> wrote:

> Hi Sandy, thank you very much for the information. It is good to know that
> MapReduce AM considers the block location information. BTW, I am not very
> familiar with the concept of splits. Is it specific to MR jobs? If
> possible, code location would be very helpful for reference as I am trying
> to implement an application master that needs to consider HDFS
> data-locality. thx.
>
> r.
>
>
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>
>> Hi Ricky,
>>
>> The input splits contain the locations of the blocks they cover.  The AM
>> gets the information from the input splits and submits requests for those
>> location.  Each container request spans all the replicas that the block is
>> located on.  Are you interested in something more specific?
>>
>> -Sandy
>>
>>
>> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <rickylee0815@gmail.com>wrote:
>>
>>> Well, I thought an application master can somewhat ask where the data
>>> exist to a namenode.... isn't it true? If it does not know where the data
>>> reside, does a MapReduce application master specify the resource name as
>>> "*" which means data locality might not be preserved at all? thx,
>>>
>>> r
>>>
>>
>>
>

Mime
View raw message