hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mithila Nagendra <mnage...@asu.edu>
Subject Re: Intermediary Data on Fair Scheduler
Date Thu, 13 Aug 2009 18:32:48 GMT
Hi Todd

So does this mean that when two jobs are assigned to a pool, where one job
has 1 map task and 1 reduce task, whereas the other has 5 map and 5 reduce
tasks, how will the switch between these jobs take place?

Lets say the scheduler starts with the bigger job, runs 1 map task, when it
switches to the shorter job what does it do with the intermediate data? for
instance in Hadoop on demand if we run a search query where would the search
keywords be stored? I assume if the bigger job is in middle of a map task
the smaller job will wait for the task to end before the the map task for
the shorter job is launched.


On Thu, Aug 13, 2009 at 10:52 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Mithila,
> I assume you're referring to fair scheduler preemption. In the preemption
> scenario, tasks are completely killed, not paused. It's not like a
> preemptive scheduler in your OS where things are "context switched". This
> is
> why the preemption is not enabled by default and has tuning parameters that
> only trigger preemption in certain situations.
> Hope that helps,
> -Todd
> On Thu, Aug 13, 2009 at 10:44 AM, Mithila Nagendra <mnagendr@asu.edu>
> wrote:
> > Hello All
> >
> > When the fair scheduler switches between two jobs, what does it do with
> the
> > intermediary data? Does it dump the data/job states onto the disk (DFS)?
> Or
> > does it do a context switch (i.e. everything is in memory)? I was looking
> > at
> > the scheduler for an application I'm working on, any pointers will be
> > appreciated!
> >
> > Thanks!
> > Mithila Nagendra
> > Arizona State University
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message