drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg
Date Thu, 01 Feb 2018 23:08:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349467#comment-16349467

ASF GitHub Bot commented on DRILL-6032:

Github user ilooner commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
    @@ -215,6 +206,7 @@ public BatchHolder() {
               MaterializedField outputField = materializedValueFields[i];
               // Create a type-specific ValueVector for this value
               vector = TypeHelper.getNewVector(outputField, allocator);
    +          int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
    --- End diff --
    Thanks for the info @paul-rogers . Since we want to have the RecordBatchSizer be a one
stop shop for size related information I propose to do the following to address @ppadma 's
     - Remove changes to estSize in the RecordBatchSizer that I introduced.
     - Add a field called knownSize to the RecordBatchSizer. This field will contain the known
column width for FixedWidthVectors and will be -1 for things like varchars and repeated columns.
Then we can add a method getKnownSize() for a column size. This method will return the known
size if available and will throw an exception if the column is not FixedWidth. So this method
should only be used by the caller when they know they are dealing with FixedWidth columns
as is the case with HashAgg for now (things will probably change when strings will be stored
in varchars instead of objects). The rest of RecordBatchSizer will remain as is.
     - Add a constant called FRAGMENTATION_FACTOR (1.33) that people can use to multiply their
custom crafted batch sizes by when they are estimating batch sizes from scratch. In HashAgg
I would multiply the custom batch size I compute by this factor.

> Use RecordBatchSizer to estimate size of columns in HashAgg
> -----------------------------------------------------------
>                 Key: DRILL-6032
>                 URL: https://issues.apache.org/jira/browse/DRILL-6032
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
> We need to use the RecordBatchSize to estimate the size of columns in the Partition batches
created by HashAgg.

This message was sent by Atlassian JIRA

View raw message