drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5758) Rollup of external sort fixes to issues found by QA
Date Tue, 26 Sep 2017 21:36:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181626#comment-16181626
] 

ASF GitHub Bot commented on DRILL-5758:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/932#discussion_r141191139
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
---
    @@ -74,53 +74,52 @@
         public final int estSize;
     
         /**
    -     * Number of times the value here (possibly repeated) appears in
    -     * the record batch.
    +     * Number of occurrences of the value in the batch. This is trivial
    +     * for top-level scalars: it is the record count. For a top-level
    +     * repeated vector, this is the number of arrays, also the record
    +     * count. For a value nested inside a repeated map, it is the
    +     * total number of values across all maps, and may be less than,
    +     * greater than (but unlikely) same as the row count.
          */
     
         public final int valueCount;
     
         /**
    -     * The number of elements in the value vector. Consider two cases.
    -     * A required or nullable vector has one element per row, so the
    -     * <tt>entryCount</tt> is the same as the <tt>valueCount</tt>
(which,
    -     * in turn, is the same as the row count.) But, if this vector is an
    -     * array, then the <tt>valueCount</tt> is the number of columns, while
    -     * <tt>entryCount</tt> is the total number of elements in all the arrays
    -     * that make up the columns, so <tt>entryCount</tt> will be different
than
    -     * the <tt>valueCount</tt> (normally larger, but possibly smaller if
most
    -     * arrays are empty.
    -     * <p>
    -     * Finally, the column may be part of another list. In this case, the above
    -     * logic still applies, but the <tt>valueCount</tt> is the number of
entries
    -     * in the outer array, not the row count.
    +     * Total number of elements for a repeated type, or 1 if this is
    +     * a non-repeated type. That is, a batch of 100 rows may have an
    +     * array with 10 elements per row. In this case, the element count
    +     * is 1000.
          */
     
    -    public int entryCount;
    +    public final int elementCount;
    --- End diff --
    
    Not related to elementCount per-se but I see that netBatchSize and accountedMemorySize
are integers.  These could overflow depending on number of columns.  Should they be longs
? 


> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>
>                 Key: DRILL-5758
>                 URL: https://issues.apache.org/jira/browse/DRILL-5758
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Tracking JIRA to used for the PR that combines fixes for various JIRA entries. Bugs fixed
in this task are given by the linked issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message