drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5601) Rollup of External Sort memory management fixes
Date Wed, 19 Jul 2017 21:34:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093824#comment-16093824
] 

ASF GitHub Bot commented on DRILL-5601:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/860#discussion_r128322098
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
---
    @@ -19,117 +19,162 @@
     
     import java.util.ArrayList;
     import java.util.List;
    +import java.util.Set;
     
    +import org.apache.drill.common.types.TypeProtos.DataMode;
     import org.apache.drill.common.types.TypeProtos.MinorType;
     import org.apache.drill.exec.expr.TypeHelper;
    +import org.apache.drill.exec.memory.AllocationManager.BufferLedger;
     import org.apache.drill.exec.memory.BaseAllocator;
     import org.apache.drill.exec.record.BatchSchema;
     import org.apache.drill.exec.record.MaterializedField;
     import org.apache.drill.exec.record.RecordBatch;
    +import org.apache.drill.exec.record.SmartAllocationHelper;
     import org.apache.drill.exec.record.VectorAccessible;
     import org.apache.drill.exec.record.VectorWrapper;
     import org.apache.drill.exec.record.selection.SelectionVector2;
    +import org.apache.drill.exec.vector.UInt4Vector;
     import org.apache.drill.exec.vector.ValueVector;
     import org.apache.drill.exec.vector.complex.AbstractMapVector;
    +import org.apache.drill.exec.vector.complex.RepeatedValueVector;
     import org.apache.drill.exec.vector.VariableWidthVector;
     
    +import com.google.common.collect.Sets;
    +
     /**
      * Given a record batch or vector container, determines the actual memory
      * consumed by each column, the average row, and the entire record batch.
      */
     
     public class RecordBatchSizer {
    -//  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RecordBatchSizer.class);
     
       /**
        * Column size information.
        */
       public static class ColumnSize {
    +    public final String prefix;
         public final MaterializedField metadata;
     
         /**
    -     * Assumed size from Drill metadata.
    +     * Assumed size from Drill metadata. Note that this information is
    +     * 100% bogus. Do not use it.
          */
     
    +    @Deprecated
         public int stdSize;
     
         /**
    -     * Actual memory consumed by all the vectors associated with this column.
    -     */
    -
    -    public int totalSize;
    -
    -    /**
          * Actual average column width as determined from actual memory use. This
          * size is larger than the actual data size since this size includes per-
          * column overhead such as any unused vector space, etc.
          */
     
    -    public int estSize;
    -    public int capacity;
    -    public int density;
    -    public int dataSize;
    -    public boolean variableWidth;
    -
    -    public ColumnSize(ValueVector vv) {
    -      metadata = vv.getField();
    -      stdSize = TypeHelper.getSize(metadata.getType());
    -
    -      // Can't get size estimates if this is an empty batch.
    -      int rowCount = vv.getAccessor().getValueCount();
    -      if (rowCount == 0) {
    -        estSize = stdSize;
    -        return;
    -      }
    +    public final int estSize;
    +    public final int valueCount;
    +    public final int entryCount;
    +    public final int dataSize;
    +    public final int estElementCount;
    +    public final boolean isVariableWidth;
     
    -      // Total size taken by all vectors (and underlying buffers)
    -      // associated with this vector.
    +    public ColumnSize(ValueVector v, String prefix, int valueCount) {
    --- End diff --
    
    This appears to be a Drill standard: it appears all over the code. But, renamed the variable
anyway.


> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
>                 Key: DRILL-5601
>                 URL: https://issues.apache.org/jira/browse/DRILL-5601
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very difficult problem
of managing memory within Drill in order for the external sort to stay within a memory budget.
In general, the fixes relate to better estimating memory used by the three ways that Drill
allocates vector memory (see DRILL-5522) and to predicting the size of vectors that the sort
will create, to avoid repeated realloc-copy cycles (see DRILL-5594).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message