hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Unable to run more than one job concurrently
Date Thu, 18 May 2006 21:53:13 GMT
Hi all,

I'm running Hadoop on a relatively small cluster (5 nodes) with growing 
datasets.

I noticed that if I start a job that is configured to run more map tasks 
than is the cluster capacity (mapred.tasktracker.tasks.maximum * number 
of nodes, 20 in this case), of course only that many map tasks will run, 
and when they are finished the next map tasks from that job will be 
scheduled.

However, when I try to start another job in parallel, only its reduce 
tasks will be scheduled (uselessly spin-waiting for map output, and only 
reducing the number of available tasks in the cluster...), and no map 
tasks from this job will be scheduled - until the first job completes. 
This feels wrong - not only I'm not making progress on the second job, 
but I'm also taking the slots away from the first job!

I'm somewhat miffed about this - I'd think that jobtracker should split 
the available resources evenly between these two jobs, i.e. it should 
schedule some map tasks from the first job and some from the second one. 
This is not what is happening, though ...

Is this a configuration error, a bug, or a feature? :)

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message