Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 939F1200C88 for ; Fri, 2 Jun 2017 23:17:52 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 92BEC160BD2; Fri, 2 Jun 2017 21:17:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D9ACB160BBA for ; Fri, 2 Jun 2017 23:17:51 +0200 (CEST) Received: (qmail 27847 invoked by uid 500); 2 Jun 2017 21:17:51 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 27836 invoked by uid 99); 2 Jun 2017 21:17:50 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Jun 2017 21:17:50 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id A4CF1DFB94; Fri, 2 Jun 2017 21:17:50 +0000 (UTC) From: Ben-Zvi To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #822: DRILL-5457: Spill implementation for Hash Aggregate Content-Type: text/plain Message-Id: <20170602211750.A4CF1DFB94@git1-us-west.apache.org> Date: Fri, 2 Jun 2017 21:17:50 +0000 (UTC) archived-at: Fri, 02 Jun 2017 21:17:52 -0000 Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/822#discussion_r119954158 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -400,114 +782,411 @@ public IterOutcome getOutcome() { @Override public int getOutputCount() { - // return outputCount; return lastBatchOutputCount; } @Override public void cleanup() { - if (htable != null) { - htable.clear(); - htable = null; - } + if ( schema == null ) { return; } // not set up; nothing to clean + for ( int i = 0; i < numPartitions; i++) { + if (htables[i] != null) { + htables[i].clear(); + htables[i] = null; + } + if ( batchHolders[i] != null) { + for (BatchHolder bh : batchHolders[i]) { + bh.clear(); + } + batchHolders[i].clear(); + batchHolders[i] = null; + } + + // delete any (still active) output spill file + if ( outputStream[i] != null && spillFiles[i] != null) { + try { + spillSet.delete(spillFiles[i]); --- End diff -- Concurrent open files: While spilling, there is one per each (non-pristine) spilling partition (yes, can be as high as 16, or even 32). Afterwards, they are all closed; then for reading, each one gets opened; and though we process one partition at a time, closing of all is postponed to the end, as the processing code is unaware that the "incoming" actually comes from a spill file. About the limits: Seems that current defaults (e.g. 64K open files per process) can serve us well for the foreseeable future. Intel just announced the i9, where the top of the line CPU has 18 cores. Hence 1000s of concurrent active same-process threads are not feasible anytime soon (think about context switching). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---