Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0B565200D0E for ; Tue, 26 Sep 2017 23:35:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 09D8F1609C4; Tue, 26 Sep 2017 21:35:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4CF801609D7 for ; Tue, 26 Sep 2017 23:35:04 +0200 (CEST) Received: (qmail 76199 invoked by uid 500); 26 Sep 2017 21:35:03 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 76141 invoked by uid 99); 26 Sep 2017 21:35:03 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Sep 2017 21:35:03 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F3415F325E; Tue, 26 Sep 2017 21:35:02 +0000 (UTC) From: amansinha100 To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #932: DRILL-5758: Fix for repeated columns; enable manage... Content-Type: text/plain Message-Id: <20170926213502.F3415F325E@git1-us-west.apache.org> Date: Tue, 26 Sep 2017 21:35:02 +0000 (UTC) archived-at: Tue, 26 Sep 2017 21:35:05 -0000 Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/932#discussion_r141191139 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java --- @@ -74,53 +74,52 @@ public final int estSize; /** - * Number of times the value here (possibly repeated) appears in - * the record batch. + * Number of occurrences of the value in the batch. This is trivial + * for top-level scalars: it is the record count. For a top-level + * repeated vector, this is the number of arrays, also the record + * count. For a value nested inside a repeated map, it is the + * total number of values across all maps, and may be less than, + * greater than (but unlikely) same as the row count. */ public final int valueCount; /** - * The number of elements in the value vector. Consider two cases. - * A required or nullable vector has one element per row, so the - * entryCount is the same as the valueCount (which, - * in turn, is the same as the row count.) But, if this vector is an - * array, then the valueCount is the number of columns, while - * entryCount is the total number of elements in all the arrays - * that make up the columns, so entryCount will be different than - * the valueCount (normally larger, but possibly smaller if most - * arrays are empty. - *

- * Finally, the column may be part of another list. In this case, the above - * logic still applies, but the valueCount is the number of entries - * in the outer array, not the row count. + * Total number of elements for a repeated type, or 1 if this is + * a non-repeated type. That is, a batch of 100 rows may have an + * array with 10 elements per row. In this case, the element count + * is 1000. */ - public int entryCount; + public final int elementCount; --- End diff -- Not related to elementCount per-se but I see that netBatchSize and accountedMemorySize are integers. These could overflow depending on number of columns. Should they be longs ? ---