From yarn-issues-return-134844-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org  Fri Jan 12 23:58:05 2018
Return-Path: <yarn-issues-return-134844-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id 59FDE180621
	for <archive-asf-public@eu.ponee.io>; Fri, 12 Jan 2018 23:58:05 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id 467E3160C42; Fri, 12 Jan 2018 22:58:05 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id 8CB16160C20
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 12 Jan 2018 23:58:04 +0100 (CET)
Received: (qmail 63288 invoked by uid 500); 12 Jan 2018 22:58:03 -0000
Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:yarn-issues-help@hadoop.apache.org>
List-Unsubscribe: <mailto:yarn-issues-unsubscribe@hadoop.apache.org>
List-Post: <mailto:yarn-issues@hadoop.apache.org>
List-Id: <yarn-issues.hadoop.apache.org>
Delivered-To: mailing list yarn-issues@hadoop.apache.org
Received: (qmail 63276 invoked by uid 99); 12 Jan 2018 22:58:03 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jan 2018 22:58:03 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0597C1808DB
	for <yarn-issues@hadoop.apache.org>; Fri, 12 Jan 2018 22:58:03 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -107.911
X-Spam-Level:
X-Spam-Status: No, score=-107.911 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01,
	USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id LueFf0tj0PeX for <yarn-issues@hadoop.apache.org>;
	Fri, 12 Jan 2018 22:58:02 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 809795F297
	for <yarn-issues@hadoop.apache.org>; Fri, 12 Jan 2018 22:58:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id BAB51E2576
	for <yarn-issues@hadoop.apache.org>; Fri, 12 Jan 2018 22:58:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 709D325BD6
	for <yarn-issues@hadoop.apache.org>; Fri, 12 Jan 2018 22:58:00 +0000 (UTC)
Date: Fri, 12 Jan 2018 22:58:00 +0000 (UTC)
From: "Wangda Tan (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13130375.1515717500000.641923.1515797880459@Atlassian.JIRA>
In-Reply-To: <JIRA.13130375.1515717500000@Atlassian.JIRA>
References: <JIRA.13130375.1515717500000@Atlassian.JIRA> <JIRA.13130375.1515717500231@jira-lw-us.apache.org>
Subject: [jira] [Commented] (YARN-7739) Revisit scheduler resource
 normalization behavior for max allocation
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324699#comment-16324699 ] 

Wangda Tan commented on YARN-7739:
----------------------------------

Thanks [~jlowe], to me it is also a bug :). I think we should get rid of this since it could badly impact users when we have multiple resources enabled. Will talk to Vinod and keep this thread updated

> Revisit scheduler resource normalization behavior for max allocation
> --------------------------------------------------------------------
>
>                 Key: YARN-7739
>                 URL: https://issues.apache.org/jira/browse/YARN-7739
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Priority: Critical
>
> Currently, YARN Scheduler normalizes requested resource based on the maximum allocation derived from configured maximum allocation and maximum registered node resources. Basically, the scheduler will silently cap asked resource by maximum allocation.
> This could cause issues for applications, for example, a Spark job which needs 12 GB memory to run, however in the cluster, registered NMs have at most 8 GB mem on each node. So scheduler allocates 8GB memory container to the requested application.
> Once app receives containers from RM, if it doesn't double check allocated resources, it will lead to OOM and hard to debug because scheduler silently caps maximum allocation.
> When non-mandatory resources introduced, this becomes worse. For resources like GPU, we typically set minimum allocation to 0 since not all nodes have GPU devices. So it is possible that application asks 4 GPUs but get 0 GPU, it gonna be a big problem.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org