hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: question about preserving data locality in MapReduce with Yarn
Date Fri, 01 Nov 2013 02:14:29 GMT
The code is slightly hard to follow since it's split between the client and the ApplicationMaster.

The client invokes InputFormat.getSplits to compute locations and writes it to a file in HDFS.
The ApplicationMaster then reads the file and creates resource-requests based on the locations
for each input file (3-replicas). See TaskAttemptImpl.dataLocalHosts and TaskAttemptImpl.dataLocalRacks
- follow those variables around in the code-base.


On Oct 28, 2013, at 11:10 PM, ricky l <rickylee0815@gmail.com> wrote:

> Hi Sandy, thank you very much for the information. It is good to know that MapReduce
AM considers the block location information. BTW, I am not very familiar with the concept
of splits. Is it specific to MR jobs? If possible, code location would be very helpful for
reference as I am trying to implement an application master that needs to consider HDFS data-locality.
> r.
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:
> Hi Ricky,
> The input splits contain the locations of the blocks they cover.  The AM gets the information
from the input splits and submits requests for those location.  Each container request spans
all the replicas that the block is located on.  Are you interested in something more specific?
> -Sandy
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <rickylee0815@gmail.com> wrote:
> Well, I thought an application master can somewhat ask where the data exist to a namenode....
isn't it true? If it does not know where the data reside, does a MapReduce application master
specify the resource name as "*" which means data locality might not be preserved at all?
> r

Arun C. Murthy
Hortonworks Inc.

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message