Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 34A24200CA5 for ; Sat, 27 May 2017 07:38:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3385E160BD9; Sat, 27 May 2017 05:38:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82C03160B9C for ; Sat, 27 May 2017 07:38:21 +0200 (CEST) Received: (qmail 27827 invoked by uid 500); 27 May 2017 05:38:14 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 26266 invoked by uid 99); 27 May 2017 05:38:13 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 May 2017 05:38:13 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D2E9BE04F2; Sat, 27 May 2017 05:38:12 +0000 (UTC) From: paul-rogers To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #822: DRILL-5457: Spill implementation for Hash Aggregate Content-Type: text/plain Message-Id: <20170527053812.D2E9BE04F2@git1-us-west.apache.org> Date: Sat, 27 May 2017 05:38:12 +0000 (UTC) archived-at: Sat, 27 May 2017 05:38:22 -0000 Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/822#discussion_r118812251 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -266,17 +508,138 @@ public void setup(HashAggregate hashAggrConfig, HashTableConfig htConfig, Fragme } } - ChainedHashTable ht = + spillSet = new SpillSet(context,hashAggrConfig, UserBitShared.CoreOperatorType.HASH_AGGREGATE); + baseHashTable = new ChainedHashTable(htConfig, context, allocator, incoming, null /* no incoming probe */, outgoing); - this.htable = ht.createAndSetupHashTable(groupByOutFieldIds); - + this.groupByOutFieldIds = groupByOutFieldIds; // retain these for delayedSetup, and to allow recreating hash tables (after a spill) numGroupByOutFields = groupByOutFieldIds.length; - batchHolders = new ArrayList(); - // First BatchHolder is created when the first put request is received. doSetup(incoming); } + /** + * Delayed setup are the parts from setup() that can only be set after actual data arrives in incoming + * This data is used to compute the number of partitions. + */ + private void delayedSetup() { + + // Set the number of partitions from the configuration (raise to a power of two, if needed) + numPartitions = context.getConfig().getInt(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY); + if ( numPartitions == 1 ) { + canSpill = false; + logger.warn("Spilling was disabled"); + } + while (Integer.bitCount(numPartitions) > 1) { // in case not a power of 2 + numPartitions++; + } --- End diff -- `BaseAllocator.nextPowerOfTwo()`. I've seen other implementations as well. Maybe pick one and put it in a utilities class somewhere so we don't have to reinvent it multiple times? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---