hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasco Visser <vasco.vis...@gmail.com>
Subject Re: Questions with regard to scheduling of map and reduce tasks
Date Thu, 30 Aug 2012 19:03:08 GMT
Vinod, thanks for the reply.

On Thu, Aug 30, 2012 at 8:19 PM, Vinod Kumar Vavilapalli
<vinodkv@hortonworks.com> wrote:
> Since you mentioned containers, I assume you are using hadoop 2.0.*. Replies
> inline.

0.23.1 with Pig 0.10.0 on top.

> When running a job with more reducers than containers available in the
> cluster all reducers get scheduled, leaving no containers available
> for the mappers to be scheduled. The result is starvation and the job
> never finishes. Is this to be considered a bug or is it expected
> behavior? The workaround is to limit the number of reducers to less
> than the number of containers available.
> No, you don't need to limit reducers yourselves, MR ApplicationMaster is
> smart enough to figure out available cluster/queue capacity and schedule
> maps/reduces accordingly. If ever it runs into a situation where it has
> outstanding maps but reduces happen to occupy all available resources, it
> will preempt reduces and start running maps.

What I see is starvation. Either it takes a very long time for the
preemption to kick in, or the preemption is broken.

How is the preemption suppose to work? Is a single reducer suppose to
be preempted or will a batch of reducers be preempted. Also, when you
say preemption, do you mean that the current execution of a reducer is
actually paused and resumed again later. Or, does preemption mean that
the reducer's container is discarded and must be started again from

> Also, it seems that from the combined pool of pending map and reduce
> tasks, randomly tasks are picked and scheduled. This causes less than
> optimal behavior. For example, I run a task with 500 mappers and 30
> reducers (my cluster has only 16 machines, two containters per machine
> (duo core machines)). What I observe is that half way through the job
> all reduce tasks are scheduled, leaving only one container for 200+
> map tasks. Again, is this expected behavior? If so, what is the idea
> behind it? And, are the map and reduce task indeed randomly scheduled
> or does it only look like they are?
> No, again MR ApplicationMaster is smart and the scheduling isn't random. It
> runs maps first, and slowly ramps up reduces as maps finish.

Do you know of any doc on the specifics of task scheduling? Would you
say that the example I gave is in line with how scheduling is


View raw message