hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: Job scheduling (Re: Unable to run more than one job concurrently)
Date Fri, 19 May 2006 21:09:27 GMT
right now, i would prefer that we have a really really simple scheme that
can be done quickly.

being able to run a huge days-long job in the background, such that a
smaller hours-long job basically takes over and runs to completion, that
would be a big win.

having multple jobs with time slicing? cool, but only 20% better and
probably 80% more work.

if we want a thorough scheme, i think eric has the right idea of using one
of the existing scheduler packages.


On 5/19/06, Andrzej Bialecki <ab@getopt.org> wrote:
>
> Doug Cutting wrote:
> > Paul Sutter wrote:
> >> (1) Allow submission times in the future, enabling the creation of
> >> "background" jobs. My understanding is that job submission times are
> >> used to
> >> prioritize scheduling. All tasks from a job submitted early run to
> >> completion before those of a job submitted later. If we could submit
> any
> >> days-long jobs with a submission time in the future, say the year
> >> 2010, and
> >> any short hours-long jobs with the current time, that short job would
> be
> >> able to interrupt the long job. Hack? Yes. Useful? I think so.
> >
> > I think this is equivalent to adding a job priority, where tasks with
> > the highest priority job are run first.  If jobs are at the same
> > priority, then the first submitted would run.  Adding priority would
> > add a bit more complexity, but would also be less of a hack.
>
>
> Hmm.. If you compare it to a Unix scheduler, processes at the same
> priority have even chances of being run, regardless of which was started
> first - not only that, processes undergo a "priority decay", in that if
> they are running longer then their priority is lowered - this enables
> new processes to start quickly (and maybe quickly finish), and then
> fairly compete with other processes.
>
> In our case, this would mean that jobs with the same priority would
> execute concurrently, sharing available map/reduce slots, and long
> running jobs would be gradually de-prioritized. This also means that the
> first job will slow down when the second one is started, but the second
> job will have a chance to make a good start (and perhaps quickly finish)
> and then, subject to the priority decay, run in parallel with other jobs
> (albeit slower) instead of being stuck in the wait queue.
>
> And if the second job is started with a higher priority, it should
> preempt the first job (i.e. it should get proportionally more slots than
> the first job). If you need all cluster resources for a specific job,
> and don't want any other jobs to run, just set the priority to the
> highest value, thus preempting all other jobs (actually, it would
> suspend other already executing jobs, which would resume when your job
> is done - not a bad feature either!).
>
> I think this is a relatively simple and well understood mechanism.
>
>
> >> (2) Have a per-job total task count limit. Currently, we establish the
> >> number of tasks each node runs, and how many map or reduce tasks we
> have
> >> total in a given job. But it would be great if we could set a ceiling
> >> on the
> >> number of tasks that run concurrently for a given job. This may help
> >> with
> >> Andrzej's fetcher (since it is bandwidth constrained, maybe fewer
> >> concurrent
> >> jobs would be fine?).
> >
> > I like this idea.  So if the highest-priority job is already running
> > at its task limit, then tasks can be run from the next
> > highest-priority job.  Should there be separate limits for maps and
> > reduces?
>
> I like this idea too. I think a similar setting for the minimum number
> of tasks would be needed too? That would solve my problem. In fact, it
> would be probably better than the schema I described above, because it
> would guarantee certain minimum tasks running at any time.
>
> This reminds me of the "idle time" and "real time" policies in BSD
> scheduler ... man 1 rtprio. The "real time" policy prevents the priority
> decay that normally occurs, and "idle time" policy allows processes to
> run only if CPU is idle.
>
> --
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message