hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity
Date Fri, 14 Jul 2017 15:44:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087500#comment-16087500

Arun Suresh commented on YARN-6808:

[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any umbrella JIRA for
Technically, this is the default behavior for opportunistic containers - as it is today. Opp
containers are killed in the NM when a Guaranteed container is started by an AM - if the NM
at that point does not have resources to start the guaranteed container. We are also working
on YARN-5972 which adds some amount of work preservation to this by - instead of killing the
Opp container, we PAUSE it. PAUSE will be supported using the cgroups [freezer|https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt]
module in linux and Windows JobObjects (we are using this in production actually)

bq. Let's say app1 in an underutilized queue, which want to preempt containers from an over-utilized
queue. Will preemption happens if app1 asks opportunistic container?
I am assuming by under-utilized, you mean starved. So currently, if app1 SPECIFICALLY asks
for Opp containers it will get them irrespective of where the queue is underutilized or not.
Opp containers ALLOCATION today are not limited by queue/cluster capacity today - It is just
limited by the length of queued containers on Nodes (YARN-1011 will in time place stricter
capacity limits by allocating only if the allocated resources are not being used). Opp containers
EXECUTION is obviously bound by available resources on the NM, and like I mentioned earlier,
running Opp containers will be killed to make room for any Guaranteed container.

bq. For target #1, who make the decision of moving guaranteed containers to opportunistic
containers. If it is still decided by central RM, does that mean preemption logics in RM are
same as today except kill operation is decided by NM side? 
Yes, it is RM. Currently in both the Schedulers, after a container is allocated, candidates
for preemption are chosen from containers of apps from queues which are above capacity - then
the RM aks the NM to preempt the containers. What the latest patch (002) here does is: Allocation
of containers happen in the same code path - but right before handing the container to the
AM, it checks if the queue capacity is exceeded - If so, downgrade the container to Opp. Thus
technically, the same apps/containers that were a target for normal preemption will become
candidates for preemption at the NM. There are obviously improvements - like that I mentioned
in the phase 2 of the JIRA in the description - where, in addition to downgrading over cap
containers to Opp, we can upgrade running Opp containers to Guaranteed for apps when some
of their Guaranteed containers complete.
Like I mentioned, we are still prototyping - we are running tests now to collect data - will
keep you guys posted on results.

bq. For overall opportunistic container execution: If OC launch request will be queued by
NM, it may wait a long time before get executed. In this case, do we need to modify AM code
to: a. expect longer delay before think the launch fails. b. asks more resource on different
hosts since there's no guaranteed launch time for OC?
So, with YARN-4597, we had introduced a container state called SCHEDULED. A container is in
the scheduled state while it is locallizing or if it is in the queue. Essentially, the extra
delay will look just like localization delay to the AM. We have verified this is fine for
MapReduce and Spark.

bq. What happens if an app doesn't want to ask opportunistic container when go beyond headroom?
(Such as online services). I think this should be a per-app config (give me OC when I'm go
beyond headroom).
A per app config makes sense. But we currently today, the ResourceRequest has a field called
{{ExecutionTypeRequest}} which in addition to the {{ExecutionType}} also has an {{enforeExecutionType}}
flag. By default, this is false - but if to true, my latest patch ensures that only Guaranteed
containers are returned. I have added a test case to ensure that as well.

bq. Existing patch makes static decision, which happens when new resource request added by
AM. Should this be reconsidered when app's headroom changed over time?
So, my latest patch (002) kind of addresses this. What I do now, is the decision is made after
container allocation. Also, now I am ignoring the headroom. I am downgrading if at the time
of Container allocation, only if the queue capacity is exceeded. The existing code paths ensure
that max-capacity of queues are never exceeded anyway.

> Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity
> -------------------------------------------------------------------------------------------
>                 Key: YARN-6808
>                 URL: https://issues.apache.org/jira/browse/YARN-6808
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-6808.001.patch
> This is based on discussions with [~kasha] and [~kkaranasos].
> Currently, when a Queues goes over capacity, apps on starved queues must wait either
for containers to complete or for them to be pre-empted by the scheduler to get resources.
> This JIRA proposes to allow Schedulers to:
> # Allocate all containers over the configured queue capacity/weight as OPPORTUNISTIC.
> # Auto-promote running OPPORTUNISTIC containers of apps as and when their GUARANTEED
containers complete.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message