Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 26 Sep 2017 18:42:00 +0000 (UTC)
From: "Konstantinos Karanasos (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13020886.1479235945000.207656.1506451320280@Atlassian.JIRA>
In-Reply-To: <JIRA.13020886.1479235945000@Atlassian.JIRA>
References: <JIRA.13020886.1479235945000@Atlassian.JIRA> <JIRA.13020886.1479235945383@jira-lw-us.apache.org>
Subject: [jira] [Commented] (YARN-5887) Policies for choosing which
 opportunistic containers to kill
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 26 Sep 2017 18:42:05 -0000


    [ https://issues.apache.org/jira/browse/YARN-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181302#comment-16181302 ] 

Konstantinos Karanasos commented on YARN-5887:
----------------------------------------------

Hi [~Hugo Oshiro],
The problem with the {{ContainerLaunchContext}} is that it gets created once and not updated during the execution.
What you want is the RM to periodically inform the NMs about the progress of applications for containers that are running on that NM.
So I think adding it in the node heartbeat response is the right way. You can look into the {{NodeStatusUpdater}} class to start.

For calculating the job progress, there are multiple ways. One of the implementations I had done internally at some point was doing exactly what you are suggesting. It is not ideal, but it is definitely a first approximation. More involved strategies could look into the DAG structure (you might wait for a single mapper to finish for starting the next stage) or take into account estimates of task runtimes from previous executions (so if you expect a task to run for 2 hours and another for 10 seconds, you can take that into account when calculating progress).

Hope this helps.

> Policies for choosing which opportunistic containers to kill
> ------------------------------------------------------------
>
>                 Key: YARN-5887
>                 URL: https://issues.apache.org/jira/browse/YARN-5887
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Konstantinos Karanasos
>
> When a guaranteed container arrives at an NM but there are no resources to start its execution, opportunistic containers will be killed to make space for the guaranteed container.
> At the moment, we kill opportunistic containers in reverse order of arrival (first the most recently started ones). This is not always the right decision. 
> For example, we might want to minimize the number of containers killed: to start a 6GB container, we could kill one 6GB opportunistic or three 2GB ones. 
> Another example would be to refrain from killing containers of jobs that are very close to completion (we have to pass job completion information to the NM in that case).


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org