hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ricky lee <rickylee0...@gmail.com>
Subject question about preserving data locality in MapReduce with Yarn
Date Tue, 29 Oct 2013 01:56:37 GMT

I have a question about maintaining data locality in a MapReduce job
launched through Yarn. Based on the Yarn tutorial, it seems like an
application master can specify resource name, memory, and cpu when
requesting containers. By carefully choosing resource names, I think the
data locality can be achieved. I am curious how the current MapReduce
application master is doing this. Does it check all needed blocks for a job
and choose subset of nodes with the most needed blocks? If someone can
point me source code snippets that make this decision, it would be very
much appreciated. thx.


View raw message