hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Reduce Jobs tieing up jobs
Date Wed, 21 Nov 2007 23:09:07 GMT

It is most common to have fewer reduce jobs than map jobs.

Also, the reason that reduce jobs start before the map jobs complete is to
avoid idling resources when possible.

This is especially important where reduce can be done in successive passes.
Counting jobs benefit from this ENORMously.

On 11/21/07 2:50 PM, "Billy" <sales@pearsonwholesale.com> wrote:

> Reduce Jobs must wait for all maps to be done before doing any work. Why are
> they started before the maps are done?
> example of problem
> If I am running a job and its taking up all the reduce task for all nodes
> and I launch a second job and see the job priority higher then the current
> running it will start running the map jobs but I have to wait until the
> first job completes to release the reduce jobs. So basically the priority
> option does not gain anything from it. unless the number of reduce jobs per
> job is less then nodes.
> Any way we can set an option or default on reduce tasks to wait until 90% or
> more jobs are done/running before launching?
> Billy

View raw message