drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4411) HashJoin should not only depend on number of records, but also on size
Date Wed, 24 Feb 2016 22:25:18 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163923#comment-15163923

ASF GitHub Bot commented on DRILL-4411:

Github user jaltekruse commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
    @@ -47,7 +47,11 @@
       private HashJoinBatch outgoingJoinBatch = null;
    -  private static final int TARGET_RECORDS_PER_BATCH = 4000;
    +  private int targetRecordsPerBatch = 4000;
    +  private boolean adjustTargetRecordsPerBatch = true;
    --- End diff --
    It looks like this flag is designed to allow the adjustment to only happen once, is that
actually what we want? If the row size is growing it would seem like a good idea to allow
for several batch size adjustments. It also removes another boolean state to manage.

> HashJoin should not only depend on number of records, but also on size
> ----------------------------------------------------------------------
>                 Key: DRILL-4411
>                 URL: https://issues.apache.org/jira/browse/DRILL-4411
>             Project: Apache Drill
>          Issue Type: Bug
>          Components:  Server
>            Reporter: MinJi Kim
>            Assignee: MinJi Kim
> In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH (4000). 
But we should not only depend on the number of records, but also size (in case of extremely
large records).

This message was sent by Atlassian JIRA

View raw message