hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Task type priorities during scheduling ?
Date Wed, 26 Jul 2006 06:17:25 GMT
Paul Sutter wrote:
> First, It matters in the case of concurrent jobs. If you submit a 20
> minute job while a 20 hour job is running, it would be nice if the
> reducers for the 20 minute job could get a chance to run before the 20
> hour job's mappers have all finished. So even without a throughput
> improvement, you have an important capability (although it may require
> another minor tweak or two to make possible).

I fear that more than a minor tweak or two are required to make 
concurrent jobs work well.  For example, you would also want to make 
sure that the long-running job does not consume all of the reduce slots, 
or the short job would again get stuck behind it.  Pausing long-running 
tasks might be required.

The best way to do this at present is to run two job trackers, and two 
tasktrackers per node, then submit long-runnning jobs to one "cluster" 
and short-running jobs to the other.

> Secondarily, we often have stragglers, where one mapper runs slower
> than the others. When this happens, we end up with a largely idle
> cluster for as long as an hour. In cases like these, good support for
> concurrent jobs _would_ improve throughput.

Can you perhaps increase the number of map tasks, so that even a slow 
task takes only a very small portion of the total execution time?

Good support for concurrent jobs would be great to have, and I'd love to 
see a patch that addresses this issue comprehensively.  I am not 
convinced that it is worth making minor tweaks that may-or-may-not 
really help us to get there.

Doug

Mime
View raw message