hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mithila Nagendra <mnage...@asu.edu>
Subject Re: Intermediary Data on Fair Scheduler
Date Thu, 13 Aug 2009 18:32:48 GMT
Hi Todd

So does this mean that when two jobs are assigned to a pool, where one job
has 1 map task and 1 reduce task, whereas the other has 5 map and 5 reduce
tasks, how will the switch between these jobs take place?

Lets say the scheduler starts with the bigger job, runs 1 map task, when it
switches to the shorter job what does it do with the intermediate data? for
instance in Hadoop on demand if we run a search query where would the search
keywords be stored? I assume if the bigger job is in middle of a map task
the smaller job will wait for the task to end before the the map task for
the shorter job is launched.

Thanks!
Mithila

On Thu, Aug 13, 2009 at 10:52 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Mithila,
>
> I assume you're referring to fair scheduler preemption. In the preemption
> scenario, tasks are completely killed, not paused. It's not like a
> preemptive scheduler in your OS where things are "context switched". This
> is
> why the preemption is not enabled by default and has tuning parameters that
> only trigger preemption in certain situations.
>
> Hope that helps,
> -Todd
>
> On Thu, Aug 13, 2009 at 10:44 AM, Mithila Nagendra <mnagendr@asu.edu>
> wrote:
>
> > Hello All
> >
> > When the fair scheduler switches between two jobs, what does it do with
> the
> > intermediary data? Does it dump the data/job states onto the disk (DFS)?
> Or
> > does it do a context switch (i.e. everything is in memory)? I was looking
> > at
> > the scheduler for an application I'm working on, any pointers will be
> > appreciated!
> >
> > Thanks!
> > Mithila Nagendra
> > Arizona State University
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message