drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536
Date Wed, 24 Jan 2018 05:41:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336949#comment-16336949
] 

ASF GitHub Bot commented on DRILL-6080:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1090#discussion_r163455571
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
---
    @@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, DataGenerator
dataGen,
     
       public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
         timer.reset();
    -    DataGenerator dataGen = new DataGenerator(fixture, rowCount, Character.MAX_VALUE);
    -    DataValidator validator = new DataValidator(rowCount, Character.MAX_VALUE);
    +    DataGenerator dataGen = new DataGenerator(fixture, rowCount, ValueVector.MAX_ROW_COUNT);
    +    DataValidator validator = new DataValidator(rowCount, ValueVector.MAX_ROW_COUNT);
         runLargeSortTest(fixture, dataGen, validator);
    -    System.out.println(timer.elapsed(TimeUnit.MILLISECONDS));
    +//    System.out.println(timer.elapsed(TimeUnit.MILLISECONDS));
    --- End diff --
    
    Removed all the timing & debugging code to avoid the need for commented-out lines.
    
    Logging in tests is a no-op; we simply discard the logs.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> ---------------------------------------------------------------------
>
>                 Key: DRILL-6080
>                 URL: https://issues.apache.org/jira/browse/DRILL-6080
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 65,536 decimal.
When we index records, the indexes run from 0 to 64K-1 or 0 to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row count. So,
if an incoming batch uses the full 64K size, sort ends up splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message