hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5187) Provide an option to turn off priorities in jobs
Date Wed, 11 Feb 2009 04:01:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672497#action_12672497
] 

Vivek Ratan commented on HADOOP-5187:
-------------------------------------

Good discussion. 

My comments are more relevant in the context of HADOOP-5199, but I'm adding them here. 

bq. Pools may be marked as either FIFO or fair sharing.
There is another kind of pool which is priority based, where jobs are sorted by priorities.
Jobs with the same priority are sorted by submission time (rather, start time). This is the
kind of pool that the Capacity Scheduler was meant to handle. So maybe you have two pools:
fairshare and priority (priority pools have an option to ignore priorities, in which case
jobs are sorted only by submission order), or maybe you conceptually have three: fairshare,
FIFO (only sorted by submission order), and priority (sorted by priority, then by submission
order). 

bq. Each pool has a minimum share (guaranteed share) defined in the config. Any excess capacity
is divided between pools according to fair sharing, as in the current scheduler.
I'm wondering if there's a way to simplify this a little more. Suppose pools only have a minimum
(guaranteed) share, which you can also think of as a pool's fair share. Any excess capacity
is simply claimed by pools that are 'running behind'. So, a scheduler would first pick a pool
based on how much the pool is being utilized (which depends on how many tasks the pool is
running compared with its minimum/fair share). If pools running below capacity do not have
a task to run, we'll eventually pick a pool that's running at/above capacity and give the
slot to it. That way, you don't need a notion of two numbers for a pool - minimum and fairshare
- and you don't need to explicitly compute excess capacity and redistribute it among pools.
Does that make sense? Or do you still feel you need the notion of both a min share and a fair
share? 

bq. Now for the purpose of scheduling, we can have one big list of runnable jobs, each of
which has a min share and a fair share.
Rather than having a global list of jobs, how about each pool just maintains its own list
of jobs in the sorted order you mention? The scheduler first picks a pool as I detailed in
my previous point, then you can pick a job based on the ordering you've mentioned. Sorting
all jobs in the cluster is expensive. Sorting jobs in a single pool will be cheaper because
there are fewer jobs. 

> Provide an option to turn off priorities in jobs
> ------------------------------------------------
>
>                 Key: HADOOP-5187
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5187
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/fair-share
>            Reporter: Hemanth Yamijala
>            Priority: Minor
>
> The fairshare scheduler can define pools mapping to queues (as defined in the capacity
scheduler - HADOOP-3445). When used in this manner, one can imagine queues set up to be used
by users who come from disparate teams or organizations (say a default queue). For such a
queue, it makes sense to ignore job priorities and consider the queue as strict FIFO, as it
is difficult to compare priorities of jobs from different users. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message