Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 93456200D0D for ; Fri, 25 Aug 2017 22:48:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9024B16D393; Fri, 25 Aug 2017 20:48:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B001016D38E for ; Fri, 25 Aug 2017 22:48:05 +0200 (CEST) Received: (qmail 52907 invoked by uid 500); 25 Aug 2017 20:48:03 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 52898 invoked by uid 99); 25 Aug 2017 20:48:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Aug 2017 20:48:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E3996C1BCF for ; Fri, 25 Aug 2017 20:48:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id rJSWtfVY0svl for ; Fri, 25 Aug 2017 20:48:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 84E995FE2E for ; Fri, 25 Aug 2017 20:48:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id CB3D6E0C21 for ; Fri, 25 Aug 2017 20:48:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 7306F25383 for ; Fri, 25 Aug 2017 20:48:00 +0000 (UTC) Date: Fri, 25 Aug 2017 20:48:00 +0000 (UTC) From: "Paul Rogers (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5741) Drillbit during startup should not exceed the available memory on a node MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 25 Aug 2017 20:48:06 -0000 [ https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142160#comment-16142160 ] Paul Rogers commented on DRILL-5741: ------------------------------------ Not sure this entirely makes sense. It is again asking the user to add a new variable to check the user's other settings. In general, Drill should not use the entire memory on a node. When running under YARN, YARN will assign memory. When running under other managers (MapR Warden, Mesos, etc.) then those systems take care of the total memory allocations across tasks. Perhaps we could, on Drillbit start, sum the memory allocations and check against total OS memory. But, how much should we reserve for the OS? For file system caching? For ZK? For other apps? Pretty soon we are trying to do node-level resource management "blind" inside the Drillbit. In Drill-on-YARN, we considered the percentage-based allocation suggested above. But, this is not as simple as it seems. Certain memory units are fixed (such as code cache), some can be adjusted. But should the ratio between heap and direct be the same at small levels (2 GB and 4 GB, say) vs at large levels (50 GB and 100 GB?). Instead, we worked the other way. We summed the memory allocation for code cache, heap and direct to get the total memory requested from YARN. I think we can consider memory oversubscription as a user error; a bit like configuring storage plugins wrong, or running too many processes for a node, or configuring the OS wrong, etc. > Drillbit during startup should not exceed the available memory on a node > ------------------------------------------------------------------------ > > Key: DRILL-5741 > URL: https://issues.apache.org/jira/browse/DRILL-5741 > Project: Apache Drill > Issue Type: Improvement > Components: Server > Affects Versions: 1.11.0 > Reporter: Kunal Khatua > Fix For: 1.12.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently, during startup, a Drillbit can be assigned large values for the following: > * Xmx (Heap) > * XX:MaxDirectMemorySize > * XX:ReservedCodeCacheSize > * XX:MaxPermSize > All of this, potentially, can exceed the available memory on a system when a Drillbit is under heavy load. It would be good to have the Drillbit ensure during startup itself that the cumulative value of these parameters does not exceed a pre-defined upper limit for the Drill process. > The proposal is to have the [runbit|https://github.com/apache/drill/blob/master/distribution/src/resources/runbit] script look for an additional environment variable: > {{DRILLBIT_MAX_PROC_MEM}} > The parameter can specify the maximum in GB/MB (similar in syntax to how the Java's MaxHeap is defined), or in terms of percentage of available memory (not to exceed 95%). > The [runbit|https://github.com/apache/drill/blob/master/distribution/src/resources/runbit] script will perform the calculation of the sum of memory required by the memory spaces (heap, direct, etc) and ensure that it is within the limit defined by the {{DRILLBIT_MAX_PROC_MEM}} env variable. > In the absence of this parameter, there will be no restriction. A node admin can then define this variable in the default terminal's environment (e.g. {{/root/.bashrc}} ) files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)