Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C1D21200D0E for ; Tue, 26 Sep 2017 20:42:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C068C1609EA; Tue, 26 Sep 2017 18:42:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1AD2A1609C4 for ; Tue, 26 Sep 2017 20:42:04 +0200 (CEST) Received: (qmail 75774 invoked by uid 500); 26 Sep 2017 18:42:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 75763 invoked by uid 99); 26 Sep 2017 18:42:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Sep 2017 18:42:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 743E51A2648 for ; Tue, 26 Sep 2017 18:42:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id wQxEKfgJnzY8 for ; Tue, 26 Sep 2017 18:42:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id DA4325FE5F for ; Tue, 26 Sep 2017 18:42:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0E974E0F10 for ; Tue, 26 Sep 2017 18:42:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 44FDB2426A for ; Tue, 26 Sep 2017 18:42:00 +0000 (UTC) Date: Tue, 26 Sep 2017 18:42:00 +0000 (UTC) From: "Konstantinos Karanasos (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5887) Policies for choosing which opportunistic containers to kill MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 26 Sep 2017 18:42:05 -0000 [ https://issues.apache.org/jira/browse/YARN-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181302#comment-16181302 ] Konstantinos Karanasos commented on YARN-5887: ---------------------------------------------- Hi [~Hugo Oshiro], The problem with the {{ContainerLaunchContext}} is that it gets created once and not updated during the execution. What you want is the RM to periodically inform the NMs about the progress of applications for containers that are running on that NM. So I think adding it in the node heartbeat response is the right way. You can look into the {{NodeStatusUpdater}} class to start. For calculating the job progress, there are multiple ways. One of the implementations I had done internally at some point was doing exactly what you are suggesting. It is not ideal, but it is definitely a first approximation. More involved strategies could look into the DAG structure (you might wait for a single mapper to finish for starting the next stage) or take into account estimates of task runtimes from previous executions (so if you expect a task to run for 2 hours and another for 10 seconds, you can take that into account when calculating progress). Hope this helps. > Policies for choosing which opportunistic containers to kill > ------------------------------------------------------------ > > Key: YARN-5887 > URL: https://issues.apache.org/jira/browse/YARN-5887 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Konstantinos Karanasos > > When a guaranteed container arrives at an NM but there are no resources to start its execution, opportunistic containers will be killed to make space for the guaranteed container. > At the moment, we kill opportunistic containers in reverse order of arrival (first the most recently started ones). This is not always the right decision. > For example, we might want to minimize the number of containers killed: to start a 6GB container, we could kill one 6GB opportunistic or three 2GB ones. > Another example would be to refrain from killing containers of jobs that are very close to completion (we have to pass job completion information to the NM in that case). -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org