hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
Date Thu, 16 Jul 2015 18:52:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630167#comment-14630167
] 

Wangda Tan commented on YARN-3784:
----------------------------------

Beyond timeout, another thing we may need consider is: after a container is removed from to-be-preempted
list, should we notify scheduler/AM about that? This could happen if other applications release
containers, or other queues/applications cancel resource requests.

Now proportionalCPP can notify scheduler many times for a same container, if we have to-preempt/remove-from-to-preempt
event, we can also reduce number of messages send to scheduler (which could cause YARN-3508
happens).

> Indicate preemption timout along with the list of containers to AM (preemption message)
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which are marked
for preemption. Introducing a timeout duration also along with this container list so that
AM can know how much time it will get to do a graceful shutdown to its containers (assuming
one of preemption policy is loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be decommissioned after
a timeout (also killing containers on it). This timeout will be helpful to indicate AM that
those containers can be killed by RM forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message