drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536
Date Sat, 13 Jan 2018 23:56:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325385#comment-16325385

ASF GitHub Bot commented on DRILL-6080:

Github user vrozov commented on a diff in the pull request:

    --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
    @@ -851,48 +851,52 @@ public void print(StringBuilder sb, int indent, Verbosity verbosity)
        * Write an integer to the buffer at the given byte index, without
        * bounds checks.
    -   * @param offset byte (not int) offset of the location to write
    +   * @param bufOffset byte (not int) offset of the location to write
        * @param value the value to write
    -  public void unsafePutInt(int offset, int value) {
    -    PlatformDependent.putInt(addr + offset, value);
    +  public void unsafePutInt(int bufOffset, int value) {
    +    assert unsafeCheckIndex(bufOffset, 4);
    --- End diff --
    I don't think that `unsafePutXXX()` provides any performance benefits over existing `setXXX()`
variants after DRILL-6004 was merged, except that unsafe variants use `assert` while existing
use `final static boolean` flag that functionally and performance wise is equivalent to java
`assert`. It is necessary to agree on a single mechanism to enable bounds checking and use
it in all methods.

> Sort incorrectly limits batch size to 65535 records rather than 65536
> ---------------------------------------------------------------------
>                 Key: DRILL-6080
>                 URL: https://issues.apache.org/jira/browse/DRILL-6080
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.13.0
> Drill places an upper limit on the number of rows in a batch of 64K. That is 65,536 decimal.
When we index records, the indexes run from 0 to 64K-1 or 0 to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row count. So,
if an incoming batch uses the full 64K size, sort ends up splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.

This message was sent by Atlassian JIRA

View raw message