hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <ma...@eecs.berkeley.edu>
Subject Re: Lack of data locality in Hadoop-0.20.2
Date Tue, 12 Jul 2011 18:23:41 GMT
Hi Virajith,

The default FIFO scheduler just isn't optimized for locality for small jobs. You should be
able to get substantially more locality even with 1 replica if you use the fair scheduler,
although the version of the scheduler in 0.20 doesn't contain the locality optimization. Try
the Cloudera distribution to get a 0.20-compatible Hadoop that does contain it.

I also think your value of 10% inferred on completion time might be a little off, because
you have quite a few more data blocks than nodes so it should be easy to make the first few
waves of tasks data-local. Try a version of Hadoop that correctly measures this counter.


On Jul 12, 2011, at 1:27 PM, Virajith Jalaparti wrote:

> I agree that the scheduler has lesser leeway when the replication factor is 1. However,
I would still expect the number of data-local tasks to be more than 10% even when the replication
factor is 1. Presumably, the scheduler would have greater number of opportunities to schedule
data-local tasks as compared to just 10%. (Please note that I am inferring that a map was
non-local based on the observed completion time. I don't know why but the logs of my jobs
don't show the DATA_LOCAL_MAPS counter information.)
> I will try using higher replication factors and see how much improvement I can get.
> Thanks,
> Virajith
> On Tue, Jul 12, 2011 at 6:15 PM, Arun C Murthy <acm@hortonworks.com> wrote:
> As Aaron mentioned the scheduler has very little leeway when you have a single replica.
> OTOH, schedulers equate rack-locality to node-locality - this makes sense sense for a
large-scale system since intra-rack b/w is good enough for most installs of Hadoop.
> Arun
> On Jul 12, 2011, at 7:36 AM, Virajith Jalaparti wrote:
>> I am using a replication factor of 1 since I dont to incur the overhead of replication
and I am not much worried about reliability. 
>> I am just using the default Hadoop scheduler (FIFO, I think!). In case of a single
rack, rack-locality doesn't really have any meaning. Obviously everything will run in the
same rack. I am concerned about data-local maps. I assumed that Hadoop would do a much better
job at ensuring data-local maps but it doesnt seem to be the case here.
>> -Virajith
>> On Tue, Jul 12, 2011 at 3:30 PM, Arun C Murthy <acm@hortonworks.com> wrote:
>> Why are you running with replication factor of 1?
>> Also, it depends on the scheduler you are using. The CapacityScheduler in 0.20.203
(not 0.20.2) has much better locality for jobs, similarly with FairScheduler.
>> IAC, running on a single rack with replication of 1 implies rack-locality for all
tasks which, in most cases, is good enough.
>> Arun
>> On Jul 12, 2011, at 5:45 AM, Virajith Jalaparti wrote:
>> > Hi,
>> >
>> > I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input data
using a 20 node cluster of nodes. HDFS is configured to use 128MB block size (so 1600maps
are created) and a replication factor of 1 is being used. All the 20 nodes are also hdfs datanodes.
I was using a bandwidth value of 50Mbps between each of the nodes (this was configured using
linux "tc"). I see that around 90% of the map tasks are reading data over the network i.e.
most of the map tasks are not being scheduled at the nodes where the data to be processed
by them is located.
>> > My understanding was that Hadoop tries to schedule as many data-local maps as
possible. But in this situation, this does not seem to happen. Any reason why this is happening?
and is there a way to actually configure hadoop to ensure the maximum possible node locality?
>> > Any help regarding this is very much appreciated.
>> >
>> > Thanks,
>> > Virajith

View raw message