hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: question about preserving data locality in MapReduce with Yarn
Date Tue, 29 Oct 2013 02:03:58 GMT
How do you know where the data exists when you begin?

Sent from a remote device. Please excuse any typos...

Mike Segel

> On Oct 28, 2013, at 8:57 PM, "ricky lee" <rickylee0815@gmail.com> wrote:
> Hi,
> I have a question about maintaining data locality in a MapReduce job launched through
Yarn. Based on the Yarn tutorial, it seems like an application master can specify resource
name, memory, and cpu when requesting containers. By carefully choosing resource names, I
think the data locality can be achieved. I am curious how the current MapReduce application
master is doing this. Does it check all needed blocks for a job and choose subset of nodes
with the most needed blocks? If someone can point me source code snippets that make this decision,
it would be very much appreciated. thx.
> -r

View raw message