drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5758) Rollup of external sort fixes to issues found by QA
Date Sun, 01 Oct 2017 00:07:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187231#comment-16187231

ASF GitHub Bot commented on DRILL-5758:

Github user paul-rogers commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
    @@ -74,53 +74,52 @@
         public final int estSize;
    -     * Number of times the value here (possibly repeated) appears in
    -     * the record batch.
    +     * Number of occurrences of the value in the batch. This is trivial
    +     * for top-level scalars: it is the record count. For a top-level
    +     * repeated vector, this is the number of arrays, also the record
    +     * count. For a value nested inside a repeated map, it is the
    +     * total number of values across all maps, and may be less than,
    +     * greater than (but unlikely) same as the row count.
         public final int valueCount;
    -     * The number of elements in the value vector. Consider two cases.
    -     * A required or nullable vector has one element per row, so the
    -     * <tt>entryCount</tt> is the same as the <tt>valueCount</tt>
    -     * in turn, is the same as the row count.) But, if this vector is an
    -     * array, then the <tt>valueCount</tt> is the number of columns, while
    -     * <tt>entryCount</tt> is the total number of elements in all the arrays
    -     * that make up the columns, so <tt>entryCount</tt> will be different
    -     * the <tt>valueCount</tt> (normally larger, but possibly smaller if
    -     * arrays are empty.
    -     * <p>
    -     * Finally, the column may be part of another list. In this case, the above
    -     * logic still applies, but the <tt>valueCount</tt> is the number of
    -     * in the outer array, not the row count.
    +     * Total number of elements for a repeated type, or 1 if this is
    +     * a non-repeated type. That is, a batch of 100 rows may have an
    +     * array with 10 elements per row. In this case, the element count
    +     * is 1000.
    -    public int entryCount;
    +    public final int elementCount;
    --- End diff --
    Good point. However, a single batch of greater than 2 GB is far more than the sort can
handle, so we'd not even get this far if the batch was this large.
    Still, the point is valid, so a new commit changes batch size variables from int to long.

> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>                 Key: DRILL-5758
>                 URL: https://issues.apache.org/jira/browse/DRILL-5758
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>              Labels: ready-to-commit
>             Fix For: 1.12.0
> Tracking JIRA to used for the PR that combines fixes for various JIRA entries. Bugs fixed
in this task are given by the linked issues.

This message was sent by Atlassian JIRA

View raw message