hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2205) FairScheduler should only preempt tasks for pools/jobs that are up next for scheduling
Date Tue, 30 Nov 2010 06:57:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965123#action_12965123

Joydeep Sen Sarma commented on MAPREDUCE-2205:

i am not proposing preempting for jobs that don't meet preemption criteria. Let me try to
rephrase more accurately - assuming that there's one total ordering -L - of jobs at any time
(for purposes of scheduling after taking min/fair shares into account).

every preemption_interval secs:
* assume we can schedule N tasks from one preemption interval to the next
* start walking L from beginning to end stopping when >= N schedulable tasks have been
** for each job J encountered if J.needsPreemption() - then bump up preemption count appropriately.

N is a parameter to this protocol. If N is too aggressive - then we may kill tasks unnecessarily.
i think we can make a fairly good guess of N based on past behavior. We can be a bit pessimistic
- at worst this will delay preemption a bit.


AFAI understand - the faircomparator _should_ be placing the jobs with the most deficit in
terms of fair/main share at the head of the sorted jobs list already. So i don't understand
why we should go about changing it. The main issue i see is the duplication of job ordering
logic in two different places (fairComparator and tasksToPreempt()) and i am hoping that by
centralizing the ordering logic in one place - we will avoid inconsistency (and the code will
be easier to understand and maintain).

> FairScheduler should only preempt tasks for pools/jobs that are up next for scheduling
> --------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-2205
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2205
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Joydeep Sen Sarma
> We have hit a problem with the preemption implementation in the FairScheduler where the
following happens:
> # job X runs short of fair share or min share and requests/causes N tasks to be preempted
> # when slots are then scheduled - tasks from some other job are actually scheduled
> # after preemption_interval has passed, job X finds it's still underscheduled and requests
preemption. goto 1.
> This has caused widespread preemption of tasks and the cluster going from high utilization
to low utilization in a few minutes.
> Some of the problems are specific to our internal version of hadoop (still 0.20 and doesn't
have the hierarchical FairScheduler) - but i think the issue here is generic (just took a
look at the trunk assignTasks and tasksToPreempt routines). The basic problem seems to be
that the logic of assignTasks+FairShareComparator is not consistent with the logic in tasksToPreempt().
The latter can choose to preempt tasks on behalf of jobs that may not be first up for scheduling
based on the FairComparator. Understanding whether these two separate pieces of logic are
consistent and keeping it that way is difficult.
> It seems that a much safer preemption implementation is to walk the jobs in the order
they would be scheduled on the next heartbeat - and only preempt for jobs that are at the
head of this sorted queue. In MAPREDUCE-2048 - we have already introduced a pre-sorted list
of jobs ordered by current scheduling priority. It seems much easier to preempt only jobs
at the head of this sorted list.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message