drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory
Date Tue, 06 Feb 2018 17:55:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354229#comment-16354229
] 

ASF GitHub Bot commented on DRILL-6123:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1107#discussion_r166384715
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java
---
    @@ -101,8 +101,12 @@ public final void resetOutputPos() {
       }
     
       public final boolean isOutgoingBatchFull() {
    -    Preconditions.checkArgument(outputPosition <= OUTPUT_BATCH_SIZE);
    -    return outputPosition == OUTPUT_BATCH_SIZE;
    +    Preconditions.checkArgument(outputPosition <= outputRowCount);
    +    return outputPosition == outputRowCount;
    --- End diff --
    
    Maybe be just a bit more paranoid? `outputPosition >= outputRowCount`?
    
    And, while we're at it, maybe `outputRowCount` -> `targetOutputRowCount`? To make clear
that the value is our target, not the actual, current row count.


> Limit batch size for Merge Join based on memory
> -----------------------------------------------
>
>                 Key: DRILL-6123
>                 URL: https://issues.apache.org/jira/browse/DRILL-6123
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. This can create
very large or very small batches (in terms of memory), depending upon average row width. Change
this to figure out output row count based on memory specified with the new outputBatchSize
option and average row width of incoming left and right batches. Output row count will be
minimum of 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message