hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mawata <chris.maw...@gmail.com>
Subject Re: Non data-local scheduling
Date Thu, 03 Oct 2013 17:52:32 GMT
Try playing with the block size vs split size. If the blocks are very 
large and the splits small then multiple splits correspond to the same 
block and if there are more splits than replicas you get rack local 

On 10/3/2013 12:57 PM, André Hacker wrote:
> Hi,
> I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity 
> scheduler (default settings for scheduler) and replication factor 3.
> I have exclusive access to the cluster to run a benchmark job and I 
> wonder why there are so few data-local and so many rack-local maps.
> The input format calculates 44 input splits and 44 map tasks, however, 
> it seems to be random how many of them are processed data locally. 
> Here the counters of my last tries:
> data-local / rack-local:
> Test 1: data-local:15 rack-local: 29
> Test 2: data-local:18 rack-local: 26
> I don't understand why there is not always 100% data local. This 
> should not be a problem since the blocks of my input file are 
> distributed over all nodes.
> Maybe someone can give me a hint.
> Thanks,
> André Hacker, TU Berlin

View raw message