drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory
Date Sat, 10 Feb 2018 00:53:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359157#comment-16359157
] 

ASF GitHub Bot commented on DRILL-6123:
---------------------------------------

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1107#discussion_r167380761
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
---
    @@ -311,8 +311,8 @@ public static ColumnSize getColumn(ValueVector v, String prefix) {
     
       public RecordBatchSizer(RecordBatch batch) {
         this(batch,
    -         (batch.getSchema().getSelectionVectorMode() == BatchSchema.SelectionVectorMode.TWO_BYTE)
?
    -         batch.getSelectionVector2() : null);
    +      (batch.getSchema() == null ? null : (batch.getSchema().getSelectionVectorMode()
== BatchSchema.SelectionVectorMode.TWO_BYTE ?
    --- End diff --
    
    yes, we can get empty batches with empty schema  and I think it makes sense to add the
check here instead of calling code. That way, it be transparently handled underneath. 


> Limit batch size for Merge Join based on memory
> -----------------------------------------------
>
>                 Key: DRILL-6123
>                 URL: https://issues.apache.org/jira/browse/DRILL-6123
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. This can create
very large or very small batches (in terms of memory), depending upon average row width. Change
this to figure out output row count based on memory specified with the new outputBatchSize
option and average row width of incoming left and right batches. Output row count will be
minimum of 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message