Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23B7618A36 for ; Tue, 30 Jun 2015 15:42:05 +0000 (UTC) Received: (qmail 19772 invoked by uid 500); 30 Jun 2015 15:42:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 19721 invoked by uid 500); 30 Jun 2015 15:42:05 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 19707 invoked by uid 99); 30 Jun 2015 15:42:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 15:42:04 +0000 Date: Tue, 30 Jun 2015 15:42:04 +0000 (UTC) From: "Sunil G (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608543#comment-14608543 ] Sunil G commented on YARN-3784: ------------------------------- Thankyou [~chris.douglas] for the comments. I will update a patch correcting these problems. Regarding below point, bq.If containers are preempted for multiple causes (e.g., over-capacity, NM decommission), then the time to preempt could vary widely My concern also was same. Currently preemption message will look like below. {noformat} message PreemptionContractProto { repeated PreemptionResourceRequestProto resource = 1; repeated PreemptionContainerProto container = 2; + optional int64 timeout = 3; } message PreemptionContainerProto { optional ContainerIdProto id = 1; } {noformat} I have added {{timeout}} per message level. I can try attaching it per container level as an optional parameter. One potential bottleneck is, different preemption events(ProportionalCPP, Decommission etc) can come to Application at different time. And {{allocate}} call from ApplicationMasterService may hit after some secs to fetch "to be preempted" containers. Hence there can be some elapsed time already lost for few containers. We can subtract and then send to AM, but will it overload scheduler if many containers are marked for preemption (storing last update time per container level)? > Indicate preemption timout along with the list of containers to AM (preemption message) > --------------------------------------------------------------------------------------- > > Key: YARN-3784 > URL: https://issues.apache.org/jira/browse/YARN-3784 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Sunil G > Assignee: Sunil G > Attachments: 0001-YARN-3784.patch > > > Currently during preemption, AM is notified with a list of containers which are marked for preemption. Introducing a timeout duration also along with this container list so that AM can know how much time it will get to do a graceful shutdown to its containers (assuming one of preemption policy is loaded in AM). > This will help in decommissioning NM scenarios, where NM will be decommissioned after a timeout (also killing containers on it). This timeout will be helpful to indicate AM that those containers can be killed by RM forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)