Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Tue, 30 Jun 2015 15:42:04 +0000 (UTC)
From: "Sunil G (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12836171.1433774810000.59419.1435678924895@Atlassian.JIRA>
In-Reply-To: <JIRA.12836171.1433774810000@Atlassian.JIRA>
References: <JIRA.12836171.1433774810000@Atlassian.JIRA>
 <JIRA.12836171.1433774810426@arcas>
Subject: [jira] [Commented] (YARN-3784) Indicate preemption timout along
 with the list of containers to AM (preemption message)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608543#comment-14608543 ] 

Sunil G commented on YARN-3784:
-------------------------------

Thankyou [~chris.douglas] for the comments.
I will update a patch correcting  these problems. Regarding below point,
bq.If containers are preempted for multiple causes (e.g., over-capacity, NM decommission), then the time to preempt could vary widely
My concern also was same. Currently preemption message will look like below.
{noformat}
 message PreemptionContractProto {
   repeated PreemptionResourceRequestProto resource = 1;
   repeated PreemptionContainerProto container = 2;
+  optional int64 timeout = 3;
 }
message PreemptionContainerProto {
  optional ContainerIdProto id = 1;
}
{noformat}

I have added {{timeout}} per message level. I can try attaching it per container level as an optional parameter. One potential bottleneck is, different preemption events(ProportionalCPP, Decommission etc) can come to Application at different time. And {{allocate}} call from ApplicationMasterService may hit after some secs to fetch "to be preempted" containers. Hence there can be some elapsed time already lost for few containers. We can subtract and then send to AM, but will it overload scheduler if many containers are marked for preemption (storing last update time per container level)?

> Indicate preemption timout along with the list of containers to AM (preemption message)
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which are marked for preemption. Introducing a timeout duration also along with this container list so that AM can know how much time it will get to do a graceful shutdown to its containers (assuming one of preemption policy is loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be decommissioned after a timeout (also killing containers on it). This timeout will be helpful to indicate AM that those containers can be killed by RM forcefully after the timeout.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)