drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5657) Implement size-aware result set loader
Date Tue, 14 Nov 2017 19:19:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252001#comment-16252001

ASF GitHub Bot commented on DRILL-5657:

Github user paul-rogers commented on the issue:

    Finally, a note on the fragmentation issue. As you noted, this is a subtle issue. It is
true that Netty maintains a memory pool, based on binary allocations, that minimizes the normal
kind of fragmentation that results from random sized allocations from a common pool.
    The cost of the binary structure is _internal_ fragmentation. Today, Drill vectors have,
on average, 25% internal fragmentation. This PR does not address this issue per-se, but sets
us on the road toward a solution.
    The key fragmentation issue that this PR _does_ deal with is that which occurs when allocations
exceed the 16 MB (default) Netty block size. In that case, Netty does, in fact, go to the
OS. The OS does a fine job of coalescing large blocks to prevent fragmentation. The problem,
however, is that, over time, more and more memory resides in the Netty free list. Eventually,
there simply is not enough memory left outside of Netty to service a jumbo (> 16MB) block.
Drill gets an OOM error though Netty has many GB of memory free; just none available in the
32+ MB size we want.
    We could force Netty to release unused memory. In fact, the original [JE-Malloc paper](https://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf)
(that you provided way back when, thanks) points out that the allocator should monitor its
pools and release memory back to the system when a pool usage drops to zero. It does not appear
that `PooledByteBufAllocatorL` implemented this feature, so the allocator never releases memory
once it lands in the allocator's free list. We could certainly fix this; the JE-Malloc paper
provides suggestions.
    Still, however, we could end up with usage patterns in which some slice of memory is used
from each chunk, blocking any chunk from being released to the OS, and thereby blocking a
"jumbo" block allocation, again though much memory is free on the free list. This is yet another
form of fragmentation.
    Finally, as you point out, all of this assumes that we want to continue to allocate "jumbo"
blocks. But, as we discovered in the managed sort work, and the hash agg spill work, Drill
has two conflicting tendencies. On the one hand, "managed" operators wish to operate within
a constrained memory footprint. (Which seems to often end up being on the order of 30 MB for
the sort for various reasons.) If the scan operator, say, decides to allocate a batch that
contains 32 MB vectors, then the sort can't accept even one of those batches an an OOM ensues.
    So, rather than solve our memory fragmentation issues by mucking with Netty (force free
of unused chunks, increase chunk size, etc.) The preferred solution is to live within a budget:
both the constraints of the Netty chunk size *and* the constraints placed on Drill operator
memory usage.
    In short, we started by wanting to solve the fragmentation issue, but we realized that
the best solution is to also solve the unlimited-batch-size issue, hence this PR.

> Implement size-aware result set loader
> --------------------------------------
>                 Key: DRILL-5657
>                 URL: https://issues.apache.org/jira/browse/DRILL-5657
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: Future
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: Future
> A recent extension to Drill's set of test tools created a "row set" abstraction to allow
us to create, and verify, record batches with very few lines of code. Part of this work involved
creating a set of "column accessors" in the vector subsystem. Column readers provide a uniform
API to obtain data from columns (vectors), while column writers provide a uniform writing
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size (to avoid
memory fragmentation due to Drill's two memory allocators.) The column accessors have proven
to be so useful that they will be the basis for the new, size-aware writers used by Drill's
record readers.
> A step in that direction is to retrofit the column writers to use the size-aware {{setScalar()}}
and {{setArray()}} methods introduced in DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer of the accessors,
those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware vector writing,
including the case in which a vector fills in the middle of a row.

This message was sent by Atlassian JIRA

View raw message