drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben-Zvi <...@git.apache.org>
Subject [GitHub] drill pull request #860: DRILL-5601: Rollup of external sort fixes and impro...
Date Thu, 13 Jul 2017 22:21:17 GMT
Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/860#discussion_r124428100
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
---
    @@ -19,7 +19,125 @@
     
     import com.google.common.annotations.VisibleForTesting;
     
    +/**
    + * Computes the memory needs for input batches, spill batches and merge
    + * batches. The key challenges that this code tries to overcome are:
    + * <ul>
    + * <li>Drill is not designed for the small memory allocations,
    + * but the planner may provide such allocations because the memory per
    + * query is divided among slices (minor fragments) and among buffering
    + * operators, leaving very little per operator.</li>
    + * <li>Drill does not provide the detailed memory information needed to
    + * carefully manage memory in tight constraints.</li>
    + * <li>But, Drill has a death penalty for going over the memory limit.</li>
    + * </ul>
    + * As a result, this class is a bit of a hack: it attempt to consider a
    + * number of ill-defined factors in order to divide up memory use in a
    + * way that prevents OOM errors.
    + * <p>
    + * First, it is necessary to differentiate two concepts:
    + * <ul>
    + * <li>The <i>data size</i> of a batch: the amount of memory needed
to hold
    + * the data itself. The data size is constant for any given batch.</li>
    + * <li>The <i>buffer size</i> of the buffers that hold the data. The
buffer
    + * size varies wildly depending on how the batch was produced.</li>
    + * </ul>
    + * The three kinds of buffer layouts seen to date include:
    + * <ul>
    + * <li>One buffer per vector component (data, offsets, null flags, etc.)
    + * &ndash; create by readers, project and other operators.</li>
    + * <li>One buffer for the entire batch, with each vector component using
    + * a slice of the overall buffer. &ndash; case for batches deserialized from
    + * exchanges.</li>
    + * <li>One buffer for each top-level vector, with component vectors
    + * using slices of the overall vector buffer &ndash; the result of reading
    + * spilled batches from disk.</li>
    + * </ul>
    + * In each case, buffer sizes are power-of-two rounded from the data size.
    + * But since the data is grouped differently in each case, the resulting buffer
    + * sizes vary considerably.
    + * <p>
    + * As a result, we can never be sure of the amount of memory needed for a
    + * batch. So, we have to estimate based on a number of factors:
    + * <ul>
    + * <li>Uses the {@link RecordBatchSizer} to estimate the data size and
    + * buffer size of each incoming batch.</li>
    + * <li>Estimates the internal fragmentation due to power-of-two rounding.</li>
    + * <li>Configured preferences for spill and output batches.</li>
    + * </ul>
    + * The code handles "normal" and "low" memory conditions.
    + * <ul>
    + * <li>In normal memory, we simply work out the number of preferred-size
    + * batches fit in memory (based on the predicted buffer size.)</li>
    + * <li>In low memory, we divide up the available memory to produce the
    + * spill and merge batch sizes. The sizes will be less than the configured
    --- End diff --
    
    Nice explanation up to here; but need more reasoning/explanation about the "low memory"
- what's different here ? What's the purpose of the configured limits ? Why sizes are smaller
(won't that yield a higher number of batches, hence may OOM ?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message