hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4665) Add preemption to the fair scheduler
Date Sun, 12 Apr 2009 08:13:14 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698187#action_12698187
] 

Hemanth Yamijala commented on HADOOP-4665:
------------------------------------------

Matei, I've not been looking at this code much, but based on the discussion, I have only one
comment: regarding the turning off of pre-emption. 

Your use case of organizations wanting to try out with pre-emption disabled, but still seeing
when pre-emption would happen seems to me like a dry-run mode that you can see in utilities
like an RPM update. As you've explained, looks like there are use cases for this.

>From our work with the capacity scheduler, we've found there are environments where pre-emption
is indeed not necessary. Even when it exists, it has proved to be a complex feature to reason
about. From this perspective, it seems like it may make sense to provide an option to completely
turn it off and have reasonable confidence that nothing related to pre-emption would be in
effect, including any additional computation that it requires.

Hence, my suggestion is the following: have a flag to truly turn off pre-emption and have
a variable that allows a dry-run of pre-emption when it is enabled. I believe this may not
be a very difficult change ? (Indeed, I've been thinking of cases where a dry-run of the entire
scheduling logic makes sense  - for e.g. to get a 'scheduling log' that can be replayed).


The flip side of my proposal is an additional configuration option. But depending on what
we think the right defaults are, we can still make the configuration easy for end users, no
? To that extent, your arguments about the proposed default values are fine with me.

> Add preemption to the fair scheduler
> ------------------------------------
>
>                 Key: HADOOP-4665
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4665
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>             Fix For: 0.21.0
>
>         Attachments: fs-preemption-v0.patch, hadoop-4665-v1.patch, hadoop-4665-v1b.patch,
hadoop-4665-v2.patch, hadoop-4665-v3.patch, hadoop-4665-v4.patch
>
>
> Task preemption is necessary in a multi-user Hadoop cluster for two reasons: users might
submit long-running tasks by mistake (e.g. an infinite loop in a map program), or tasks may
be long due to having to process large amounts of data. The Fair Scheduler (HADOOP-3746) has
a concept of guaranteed capacity for certain queues, as well as a goal of providing good performance
for interactive jobs on average through fair sharing. Therefore, it will support preempting
under two conditions:
> 1) A job isn't getting its _guaranteed_ share of the cluster for at least T1 seconds.
> 2) A job is getting significantly less than its _fair_ share for T2 seconds (e.g. less
than half its share).
> T1 will be chosen smaller than T2 (and will be configurable per queue) to meet guarantees
quickly. T2 is meant as a last resort in case non-critical jobs in queues with no guaranteed
capacity are being starved.
> When deciding which tasks to kill to make room for the job, we will use the following
heuristics:
> - Look for tasks to kill only in jobs that have more than their fair share, ordering
these by deficit (most overscheduled jobs first).
> - For maps: kill tasks that have run for the least amount of time (limiting wasted time).
> - For reduces: similar to maps, but give extra preference for reduces in the copy phase
where there is not much map output per task (at Facebook, we have observed this to be the
main time we need preemption - when a job has a long map phase and its reducers are mostly
sitting idle and filling up slots).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message