Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 388BD200D10 for ; Sun, 24 Sep 2017 22:50:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 371EA1609E6; Sun, 24 Sep 2017 20:50:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7CB5B1609A7 for ; Sun, 24 Sep 2017 22:49:59 +0200 (CEST) Received: (qmail 46457 invoked by uid 500); 24 Sep 2017 20:49:58 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 46444 invoked by uid 99); 24 Sep 2017 20:49:58 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Sep 2017 20:49:58 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 77C96F564E; Sun, 24 Sep 2017 20:49:56 +0000 (UTC) From: paul-rogers To: dev@drill.apache.org Reply-To: dev@drill.apache.org Message-ID: Subject: [GitHub] drill pull request #958: DRILL-5808: Reduce memory allocator strictness for ... Content-Type: text/plain Date: Sun, 24 Sep 2017 20:49:56 +0000 (UTC) archived-at: Sun, 24 Sep 2017 20:50:00 -0000 GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/958 DRILL-5808: Reduce memory allocator strictness for "managed" operators The "managed" external sort and the hash agg operators now actively attempt to stay within a memory "budget." Out goals are to: 1. Stay within the budget, and 2. Make full use of the available memory. Unfortunately, at present, Drill has a number of limitations that work at cross-purposes to the above goal. * Upstream operators create record batches potentially larger than the memory budget. * Memory allocations are "lumpy" - power of two rounded. * Vectors double in size automatically when needed. The combination of the above means that memory planning must be aware of the size of each and every vector to the byte level in order to predict size doubling and power-of-two rounding. But, of course, Drill is schema-on-read, meaning that Drill cannot know ahead of time the "shape" of the data it will process. Without that information, memory estimates are, at best, averages, but actual allocations have a wide variance around those averages. Add to this Drill's memory allocation scheme: each operator is given a strict budget enforced by the memory allocator. Go above the budget by a single byte and the query dies. How do we resolve this conflict? On the one hand, Drill's internals are rough-and-ready; it is impossible to predict actual memory usage. On the other hand, the allocator requires perfect prediction else the user suffers with failed queries. Much work is needed in Drill internals to provide for better memory management. (Relational databases have long ago solved the issues, so solutions are available.) Until then, this commit introduces a work-around. Memory-managed operators can ask for "leniency" from the allocator. In this mode, the allocator: * Allows actual memory use to spike up to 100% of the limit, or 100 MB, whichever is less, * Logs each such "excess allocation" as a warning, so we can identify and fix issues, and * Allows leniency only in production environments, but not during development or test. That is, we give users a margin for error so that their queries succeed even if Drill's memory calculations don't come out exactly right. This should be fine because, of course, Drill still has several operators that observe no memory limits at all. Seems silly to have one operator using GBs of memory, while enforcing a typical 30 MB limit on others. Until all operators are memory managed, and Drill provides better memory management tools, this PR allows queries to succeed even if we get things slightly wrong internally. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5808 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/958.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #958 ---- commit a9c5083b8743efa2b5c74fee77e12d8f69258601 Author: Paul Rogers Date: 2017-09-24T19:51:43Z DRILL-5808: Reduce memory allocator strictness for "managed" operators ---- ---