drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg
Date Tue, 06 Feb 2018 19:14:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354394#comment-16354394
] 

ASF GitHub Bot commented on DRILL-6032:
---------------------------------------

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r166411194
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
---
    @@ -733,28 +780,32 @@ private void restoreReservedMemory() {
        * @param records
        */
       private void allocateOutgoing(int records) {
    -    // Skip the keys and only allocate for outputting the workspace values
    -    // (keys will be output through splitAndTransfer)
    -    Iterator<VectorWrapper<?>> outgoingIter = outContainer.iterator();
    -    for (int i = 0; i < numGroupByOutFields; i++) {
    -      outgoingIter.next();
    -    }
    -
         // try to preempt an OOM by using the reserved memory
         useReservedOutgoingMemory();
         long allocatedBefore = allocator.getAllocatedMemory();
     
    -    while (outgoingIter.hasNext()) {
    +    for (int columnIndex = numGroupByOutFields; columnIndex < outContainer.getNumberOfColumns();
columnIndex++) {
    +      final VectorWrapper wrapper = outContainer.getValueVector(columnIndex);
           @SuppressWarnings("resource")
    -      ValueVector vv = outgoingIter.next().getValueVector();
    +      final ValueVector vv = wrapper.getValueVector();
     
    -      AllocationHelper.allocatePrecomputedChildCount(vv, records, maxColumnWidth, 0);
    +      final RecordBatchSizer.ColumnSize columnSizer = new RecordBatchSizer.ColumnSize(wrapper.getValueVector());
    +      int columnSize;
    +
    +      if (columnSizer.hasKnownSize()) {
    +        // For fixed width vectors we know the size of each record
    +        columnSize = columnSizer.getKnownSize();
    +      } else {
    +        // For var chars we need to use the input estimate
    +        columnSize = varcharValueSizes.get(columnIndex);
    +      }
    +
    +      AllocationHelper.allocatePrecomputedChildCount(vv, records, columnSize, 0);
    --- End diff --
    
    Hmm. The element count from the sizer would tell us the number of rows in the incoming
batch. But that will not give us an accurate prediction for the number of keys we have aggregations
for. Maybe we should use the number of keys in the hashtable instead.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> -----------------------------------------------------------
>
>                 Key: DRILL-6032
>                 URL: https://issues.apache.org/jira/browse/DRILL-6032
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the Partition batches
created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message