hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Hacker <andrephac...@gmail.com>
Subject Re: Non data-local scheduling
Date Thu, 03 Oct 2013 17:36:00 GMT
Thanks, but I can't set this to a fraction, it wants to see an integer.
My documentation is slightly different:
"Number of missed scheduling opportunities after which the
CapacityScheduler attempts to schedule rack-local containers. Typically
this should be set to number of racks in the cluster, this feature is
disabled by default, set to -1."

I set it to 1 and now I had 33 data local and 11 rack local tasks, which is
a better, but still not optimal.

Couldn't find a good description of what this feature means (what is a
scheduling opportunity, how many are there?). It does not seem to be in the
current documentation
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html


2013/10/3 Sandy Ryza <sandy.ryza@cloudera.com>

> Hi Andre,
>
> Try setting yarn.scheduler.capacity.node-locality-delay to a number
> between 0 and 1.  This will turn on delay scheduling - here's the doc on
> how this works:
>
> For applications that request containers on particular nodes, the number
> of scheduling opportunities since the last container assignment to wait
> before accepting a placement on another node. Expressed as a float between
> 0 and 1, which, as a fraction of the cluster size, is the number of
> scheduling opportunities to pass up. The default value of -1.0 means don't
> pass up any scheduling opportunities.
>
> -Sandy
>
>
> On Thu, Oct 3, 2013 at 9:57 AM, André Hacker <andrephacker@gmail.com>wrote:
>
>> Hi,
>>
>> I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity
>> scheduler (default settings for scheduler) and replication factor 3.
>>
>> I have exclusive access to the cluster to run a benchmark job and I
>> wonder why there are so few data-local and so many rack-local maps.
>>
>> The input format calculates 44 input splits and 44 map tasks, however, it
>> seems to be random how many of them are processed data locally. Here the
>> counters of my last tries:
>>
>> data-local / rack-local:
>> Test 1: data-local:15 rack-local: 29
>> Test 2: data-local:18 rack-local: 26
>>
>> I don't understand why there is not always 100% data local. This should
>> not be a problem since the blocks of my input file are distributed over all
>> nodes.
>>
>> Maybe someone can give me a hint.
>>
>> Thanks,
>> André Hacker, TU Berlin
>>
>
>

Mime
View raw message