hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
Date Sat, 21 Nov 2015 01:02:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019226#comment-15019226
] 

Sunil G commented on YARN-3784:
-------------------------------

As mentioned earlier, preemption timeout for a container can vary. (PCPP or decommission).
 

Assume container1 is marked for preemption at time frame X.  After that container2 at time
frame X+1. Similarly container3 at X+2. If AM heartbeat interval is 3sec, next heartbeat will
come at X+3. Hence there is an elapsed time for each 3 containers which are to be preempted
in the scheduler wait queue. Like container1 has 3sec, container2 has 2sec and container3
has 1sec. If this elapsed time can be subtracted from each containers proposed timeout,  a
more realistic and correct timeout will reach AM. 
This can help in taking a better decision whether to a graceful checkpoint preemption or do
some immediate local copy of some o/p file etc. I ll raise these AM side improvements as needed
after this ticket. Also this is a good metric in AM,  user can also get this information.
If any containers crossed it's timeout while waiting for AM heartbeat like u mentioned,  I
ll mark as - 1 or 0 to indicate to AM that a possible action is already taken in RM mostly,
 and AM need not to do any graceful preemption on that.  This will also be an improvement
in AM side which is already in plan. I ll be handling this in the MR side ticket. 

Does this make sense? 

> Indicate preemption timout along with the list of containers to AM (preemption message)
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, 0003-YARN-3784.patch,
0004-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which are marked
for preemption. Introducing a timeout duration also along with this container list so that
AM can know how much time it will get to do a graceful shutdown to its containers (assuming
one of preemption policy is loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be decommissioned after
a timeout (also killing containers on it). This timeout will be helpful to indicate AM that
those containers can be killed by RM forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message