hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
Date Sat, 21 Nov 2015 03:20:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020210#comment-15020210
] 

Sunil G commented on YARN-3784:
-------------------------------

Thanks [~Naganarasimha Garla] for the comments.

As discussed offline, this patch is proposing a timeout with each to-be-preempted container
to AM. As mentioned above, this timeout is derived by calculating,
{noformat}
(Timeout associated with Each container) - (Total Timewait happened for this container in
the queue for AM heartbeat)
{noformat}

For eg: if 15 sec is the proposed timeout for a container to get a hard kill from RM and 3
sec is the AM heartbeat interval, then in worst case 12sec will be timeout proposed to AM.

So few points here based on the comments above.
- 12 seconds to timeout is a meaningful unit to AM to understand that the specified container
will be preempted.
- A Delta in terms of milli seconds may happen as the time taken to reach heartbeat response
from RM to AM.  As I see this, this may not very big in terms of considerable time unit. I
would like to get an opinion from community on this part.

If this DELTA is not much considerable amount of unit, then giving a {{timeout}} to AM is
more meaningful and easily understandable. Else we may give a time unit like {{time from epoch
time + timeout}}, which AM needs to re-translate by looking at current time in AM's end and
take a difference (a LONG value will be passed in heartbeat). This is also point to note as
the information passed is not very direct or understandable like "going to get timeout in
X milli seconds".
As mentioned by Naga, this latter option will definitely help in avoiding few extra logic
in RM end (calculation of timeout difference) but comes with a cost of added complexity/less
clarity in the information passed. [~leftnoteasy] and [~djp], could you also pls share your
views on this.

[~Naganarasimha Garla], I will see the other comments based on the feedback for the first
points. Thank You!!

> Indicate preemption timout along with the list of containers to AM (preemption message)
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, 0003-YARN-3784.patch,
0004-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which are marked
for preemption. Introducing a timeout duration also along with this container list so that
AM can know how much time it will get to do a graceful shutdown to its containers (assuming
one of preemption policy is loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be decommissioned after
a timeout (also killing containers on it). This timeout will be helpful to indicate AM that
those containers can be killed by RM forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message