hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4471) Capacity Scheduler should maintain the right ordering of jobs in its running queue
Date Mon, 10 Nov 2008 14:59:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646274#action_12646274

Hemanth Yamijala commented on HADOOP-4471:

I am documenting a few more discussions that Vivek, Owen and I had.

It is worthwhile to note that there is another problem with maintaining running jobs sorted
by priorities. That is the problem of temporary disk space usage. 

For e.g. consider a low priority job that has started running. The maps run for this job will
use disk space for storing the intermediate outputs. At this point, if a higher priority job
is submitted and it starts running, the space used for the low priority job would be held
up until it completes. 

This situation is not new, and exists even with the default scheduler. However, because the
capacity scheduler runs multiple jobs concurrently (from multiple queues, or from different
users), the problem is slightly more serious in this case.

That said, it is still not clear what a right way of fixing this problem is. At the same time,
not sorting running jobs still makes it extremely difficult for users to run high priority
jobs in preference to lower priority ones if the need arises. Hence, while the problems with
sorting running jobs are acknowledged, we may still want to do this and address the issues
in related jiras like HADOOP-4557.

> Capacity Scheduler should maintain the right ordering of jobs in its running queue
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-4471
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4471
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Blocker
>             Fix For: 0.19.1
>         Attachments: HADOOP-4471-v1.patch
> Currently, the Capacity Scheduler maintains a simple linked list of jobs which are running.
This implies that running jobs are sorted by when they started running (i.e., when they were
added to the queue). The Scheduler should maintain the same ordering among running jobs that
it does for waiting jobs. Jobs should be sorted by priority (if the queue supports priorities)
and by their submit time. 
> This sorting would be more fair in deciding which running jobs get access to a free TT.
It also does not penalize jobs that have a longer setup task, which affects when they enter
the run queue. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message