drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6123) Limit batch size for Merge Join based on memory
Date Fri, 09 Feb 2018 16:55:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358658#comment-16358658
] 

ASF GitHub Bot commented on DRILL-6123:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1107#discussion_r167281940
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractRecordBatchMemoryManager.java
---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.exec.record;
    +
    +import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
    +import org.apache.drill.exec.vector.ValueVector;
    +
    +public abstract class AbstractRecordBatchMemoryManager {
    +  private int outgoingRowWidth;
    +  private int outputRowCount = ValueVector.MAX_ROW_COUNT;
    +  protected static final int OFFSET_VECTOR_WIDTH = 4;
    +  protected static final int WORST_CASE_FRAGMENTATION_FACTOR = 2;
    +  protected static final int MAX_NUM_ROWS = ValueVector.MAX_ROW_COUNT;
    --- End diff --
    
    Maybe put the contestants at the top? Then, set `outputRowCount = MAX_NUM_ROWS`.


> Limit batch size for Merge Join based on memory
> -----------------------------------------------
>
>                 Key: DRILL-6123
>                 URL: https://issues.apache.org/jira/browse/DRILL-6123
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Merge join limits output batch size to 32K rows irrespective of row size. This can create
very large or very small batches (in terms of memory), depending upon average row width. Change
this to figure out output row count based on memory specified with the new outputBatchSize
option and average row width of incoming left and right batches. Output row count will be
minimum of 1 and max of 64k. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message