hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virajith Jalaparti <virajit...@gmail.com>
Subject Re: Lack of data locality in Hadoop-0.20.2
Date Tue, 12 Jul 2011 18:35:00 GMT
On 7/12/2011 7:20 PM, Allen Wittenauer wrote:
> On Jul 12, 2011, at 10:27 AM, Virajith Jalaparti wrote:
>> I agree that the scheduler has lesser leeway when the replication factor is
>> 1. However, I would still expect the number of data-local tasks to be more
>> than 10% even when the replication factor is 1.
> 	How did you load your data?
> 	Did you load it from outside the grid or from one of the datanodes?  If you loaded from
one of the datanodes, you'll basically have no real locality, especially with a rep factor
of 1.
I create the data using the randomwriter in the hadoop examples. I 
essentially run the example at http://wiki.apache.org/hadoop/Sort (% 
bin/hadoop jar hadoop-*-examples.jar randomwriter rand % bin/hadoop jar 
hadoop-*-examples.jar sort rand rand-sort) with the necessary parameters.


View raw message