hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Hacker <andrephac...@gmail.com>
Subject Non data-local scheduling
Date Thu, 03 Oct 2013 16:57:28 GMT

I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity
scheduler (default settings for scheduler) and replication factor 3.

I have exclusive access to the cluster to run a benchmark job and I wonder
why there are so few data-local and so many rack-local maps.

The input format calculates 44 input splits and 44 map tasks, however, it
seems to be random how many of them are processed data locally. Here the
counters of my last tries:

data-local / rack-local:
Test 1: data-local:15 rack-local: 29
Test 2: data-local:18 rack-local: 26

I don't understand why there is not always 100% data local. This should not
be a problem since the blocks of my input file are distributed over all

Maybe someone can give me a hint.

André Hacker, TU Berlin

View raw message