hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Halperin <lomax0...@gmail.com>
Subject Job scheduling
Date Mon, 06 Jun 2011 23:39:05 GMT

I might be misunderstanding how scheduling is supposed to work, or I might
have something misconfigured, but my Map/Reduce jobs don't seem to run where
my data is located.

I get a bunch of these messages:
INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201106062049_0001_
m_000021 has split on node:/rack1/rack1node1.local
... indicating it has correctly found the source data at my node
/rack1/rack1node1 (the only copy of the data - for the purpose of this
experiment I have set dfs.replication = dfs.replication.min =
dfs.replication.max = 1 so I only have 1 replica).

However, it then goes on to run the JOB_SETUP, MAP, REDUCE, JOB_CLEANUP
tasks on abitrary tasktrackers, usually not where the data is located, so
the first thing they have to do is pull it over the network from another

Did I miss something - or hopefully configure something wrong? :)


View raw message