drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #958: DRILL-5808: Reduce memory allocator strictness for ...
Date Sun, 24 Sep 2017 20:49:56 GMT
GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/958

    DRILL-5808: Reduce memory allocator strictness for "managed" operators

    The "managed" external sort and the hash agg operators now actively attempt to stay within
a memory "budget." 
    
    Out goals are to:
    
    1. Stay within the budget, and
    2. Make full use of the available memory.
    
    Unfortunately, at present, Drill has a number of limitations that work at cross-purposes
to the above goal.
    
    * Upstream operators create record batches potentially larger than the memory budget.
    * Memory allocations are "lumpy" - power of two rounded.
    * Vectors double in size automatically when needed.
    
    The combination of the above means that memory planning must be aware of the size of each
and every vector to the byte level in order to predict size doubling and power-of-two rounding.
    
    But, of course, Drill is schema-on-read, meaning that Drill cannot know ahead of time
the "shape" of the data it will process. Without that information, memory estimates are, at
best, averages, but actual allocations have a wide variance around those averages.
    
    Add to this Drill's memory allocation scheme: each operator is given a strict budget enforced
by the memory allocator. Go above the budget by a single byte and the query dies.
    
    How do we resolve this conflict? On the one hand, Drill's internals are rough-and-ready;
it is impossible to predict actual memory usage. On the other hand, the allocator requires
perfect prediction else the user suffers with failed queries.
    
    Much work is needed in Drill internals to provide for better memory management. (Relational
databases have long ago solved the issues, so solutions are available.) Until then, this commit
introduces a work-around.
    
    Memory-managed operators can ask for "leniency" from the allocator. In this mode, the
allocator:
    
    * Allows actual memory use to spike up to 100% of the limit, or 100 MB, whichever is less,
    * Logs each such "excess allocation" as a warning, so we can identify and fix issues,
and
    * Allows leniency only in production environments, but not during development or test.
    
    That is, we give users a margin for error so that their queries succeed even if Drill's
memory calculations don't come out exactly right.
    
    This should be fine because, of course, Drill still has several operators that observe
no memory limits at all. Seems silly to have one operator using GBs of memory, while enforcing
a typical 30 MB limit on others.
    
    Until all operators are memory managed, and Drill provides better memory management tools,
this PR allows queries to succeed even if we get things slightly wrong internally.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5808

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #958
    
----
commit a9c5083b8743efa2b5c74fee77e12d8f69258601
Author: Paul Rogers <progers@maprtech.com>
Date:   2017-09-24T19:51:43Z

    DRILL-5808: Reduce memory allocator strictness for "managed" operators

----


---

Mime
View raw message