drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536
Date Fri, 26 Jan 2018 02:20:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340469#comment-16340469
] 

ASF GitHub Bot commented on DRILL-6080:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1090#discussion_r164021818
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
---
    @@ -31,8 +31,9 @@
       private int length;
     
       public SelectionVector4(ByteBuf vector, int recordCount, int batchRecordCount) throws
SchemaChangeException {
    -    if (recordCount > Integer.MAX_VALUE /4) {
    -      throw new SchemaChangeException(String.format("Currently, Drill can only support
allocations up to 2gb in size.  You requested an allocation of %d bytes.", recordCount * 4));
    +    if (recordCount > Integer.MAX_VALUE / 4) {
    +      throw new SchemaChangeException(String.format("Currently, Drill can only support
allocations up to 2gb in size. " +
    +          "You requested an allocation of %d bytes.", recordCount * 4));
    --- End diff --
    
    Sounds like two issues.
    
    First, while I pointed out opportunities for improvement in the code to be consistent
with work elsewhere, the code as it is has worked for the last two years.
    
    Second, if it helps to move this PR ahead for @ilooner, I can back out the formatting
changes to this file so that it drops out of the PR. That said,  our general policy has been
to include code cleanup within other commits rather than incurring the cost and delay of doing
two commits for each bit of work (one for code cleanup, another for substantive changes.)
    
    Besides this issue, anything else needed?


> Sort incorrectly limits batch size to 65535 records rather than 65536
> ---------------------------------------------------------------------
>
>                 Key: DRILL-6080
>                 URL: https://issues.apache.org/jira/browse/DRILL-6080
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 65,536 decimal.
When we index records, the indexes run from 0 to 64K-1 or 0 to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row count. So,
if an incoming batch uses the full 64K size, sort ends up splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message