[ https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027219#comment-16027219
]
ASF GitHub Bot commented on DRILL-5457:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/822#discussion_r118812261
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
---
@@ -266,17 +508,138 @@ public void setup(HashAggregate hashAggrConfig, HashTableConfig
htConfig, Fragme
}
}
- ChainedHashTable ht =
+ spillSet = new SpillSet(context,hashAggrConfig, UserBitShared.CoreOperatorType.HASH_AGGREGATE);
+ baseHashTable =
new ChainedHashTable(htConfig, context, allocator, incoming, null /* no incoming
probe */, outgoing);
- this.htable = ht.createAndSetupHashTable(groupByOutFieldIds);
-
+ this.groupByOutFieldIds = groupByOutFieldIds; // retain these for delayedSetup, and
to allow recreating hash tables (after a spill)
numGroupByOutFields = groupByOutFieldIds.length;
- batchHolders = new ArrayList<BatchHolder>();
- // First BatchHolder is created when the first put request is received.
doSetup(incoming);
}
+ /**
+ * Delayed setup are the parts from setup() that can only be set after actual data
arrives in incoming
+ * This data is used to compute the number of partitions.
+ */
+ private void delayedSetup() {
+
+ // Set the number of partitions from the configuration (raise to a power of two,
if needed)
+ numPartitions = context.getConfig().getInt(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY);
+ if ( numPartitions == 1 ) {
+ canSpill = false;
+ logger.warn("Spilling was disabled");
+ }
+ while (Integer.bitCount(numPartitions) > 1) { // in case not a power of 2
+ numPartitions++;
+ }
+ if ( schema == null ) { estMaxBatchSize = 0; } // incoming was an empty batch
--- End diff --
Not sure this is even legal. A record batch must have a schema, even if the schema is
an empty set of columns.
> Support Spill to Disk for the Hash Aggregate Operator
> -----------------------------------------------------
>
> Key: DRILL-5457
> URL: https://issues.apache.org/jira/browse/DRILL-5457
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.10.0
> Reporter: Boaz Ben-Zvi
> Assignee: Boaz Ben-Zvi
> Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too small to allow
in memory work for the Hash Aggregate Operator.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
|