From yarn-issues-return-134047-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org  Wed Jan  3 22:02:10 2018
Return-Path: <yarn-issues-return-134047-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id D09F818077A
	for <archive-asf-public@eu.ponee.io>; Wed,  3 Jan 2018 22:02:10 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id C0566160C1B; Wed,  3 Jan 2018 21:02:10 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id 132FE160C05
	for <archive-asf-public@cust-asf.ponee.io>; Wed,  3 Jan 2018 22:02:09 +0100 (CET)
Received: (qmail 72802 invoked by uid 500); 3 Jan 2018 21:02:09 -0000
Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:yarn-issues-help@hadoop.apache.org>
List-Unsubscribe: <mailto:yarn-issues-unsubscribe@hadoop.apache.org>
List-Post: <mailto:yarn-issues@hadoop.apache.org>
List-Id: <yarn-issues.hadoop.apache.org>
Delivered-To: mailing list yarn-issues@hadoop.apache.org
Received: (qmail 72791 invoked by uid 99); 3 Jan 2018 21:02:09 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jan 2018 21:02:09 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B33CD1A096E
	for <yarn-issues@hadoop.apache.org>; Wed,  3 Jan 2018 21:02:08 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -100.011
X-Spam-Level:
X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31
	tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001,
	T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id pjeqSlaiVfAG for <yarn-issues@hadoop.apache.org>;
	Wed,  3 Jan 2018 21:02:07 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5A8415FBBA
	for <yarn-issues@hadoop.apache.org>; Wed,  3 Jan 2018 21:02:07 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6114EE256E
	for <yarn-issues@hadoop.apache.org>; Wed,  3 Jan 2018 21:02:05 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DFF8924116
	for <yarn-issues@hadoop.apache.org>; Wed,  3 Jan 2018 21:02:02 +0000 (UTC)
Date: Wed, 3 Jan 2018 21:02:02 +0000 (UTC)
From: "Miklos Szegedi (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13127894.1514861645000.567335.1515013322916@Atlassian.JIRA>
In-Reply-To: <JIRA.13127894.1514861645000@Atlassian.JIRA>
References: <JIRA.13127894.1514861645000@Atlassian.JIRA> <JIRA.13127894.1514861645389@jira-lw-us.apache.org>
Subject: [jira] [Commented] (YARN-7693) ContainersMonitor support
 configurable
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310276#comment-16310276 ] 

Miklos Szegedi commented on YARN-7693:
--------------------------------------

Thank you for the reply [~yangjiandan].
+0 on the approach adding a separarate monitor class for this. I think it is useful to be able to change the monitor.
In terms of the feature you described I have some suggestions, you may want to consider.
First of all please consider using a JIRA feature for your project making this as a sub-task. How about doing this as part of YARN-1747 or even better YARN-1011?
You may want to leverage the option to simply turn off the current cgroups memory enforcement using the configuration added in YARN-7064. It also handles monitoring resource utilization using cgroups.
bq. 1) Separate containers into two different group Opportunistic_Group and Guaranteed_Group under hadoop-yarn
The reason why it is useful to have a single cgroup hadoop-yarn for all containers that you can set a single logic and control the OOM killer for all. I would be happy to look at the actual code, but adjusting two different cgroups may add too much complexity. It is especially problematic in case of promotion. When an opportunistic container is promoted to guaranted, you need to move to the other cgroup but this requires heavy lifting from the kernel that takes significant time. See https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt for details.
bq. 2) Monitor system resource utilization and dynamically adjust resource of Opportunistic_Group
The concern here is that dynamically adjusting does not work in the current implementation either. This is because it is too slow to respond in extreme cases. Please check out YARN-6677, YARN-4599 and YARN-1014. The idea there is to disable the OOM killer on hadoop-yarn as you also suggested, so that we get notified by the kernel when the system resource utilization is low. YARN can then decide which container to preempt or adjust the soft limit, while the containers are paused. The preemption unblocks the containers. Please let us know, if you have time and you would like to contribute.
bq. 3) Kill container only when adjust resource fail for given times
I absolutely agree with this. A sudden spike in cpu usage should not trigger immediate preemption. In case of memory I am not sure how much you can adjust though. My understanding is that the basic design of opportunistic containers is that they never affect the performance of guaranteed ones but using IO for swapping would exactly do that. How would you reduce memory usage if not preempting?

> ContainersMonitor support configurable
> --------------------------------------
>
>                 Key: YARN-7693
>                 URL: https://issues.apache.org/jira/browse/YARN-7693
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>            Priority: Minor
>         Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor system metrics and even dynamically adjust Opportunistic and Guaranteed resources in the cgroup, so another ContainersMonitor may need to be implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new ContainerManagerImpl, so ContainersMonitor need to be configurable.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org