Return-Path: X-Original-To: apmail-parquet-commits-archive@minotaur.apache.org Delivered-To: apmail-parquet-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19D7917F8A for ; Fri, 13 Mar 2015 19:55:30 +0000 (UTC) Received: (qmail 91261 invoked by uid 500); 13 Mar 2015 19:55:27 -0000 Delivered-To: apmail-parquet-commits-archive@parquet.apache.org Received: (qmail 91235 invoked by uid 500); 13 Mar 2015 19:55:26 -0000 Mailing-List: contact commits-help@parquet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@parquet.incubator.apache.org Delivered-To: mailing list commits@parquet.incubator.apache.org Received: (qmail 91221 invoked by uid 99); 13 Mar 2015 19:55:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2015 19:55:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 13 Mar 2015 19:55:25 +0000 Received: (qmail 88795 invoked by uid 99); 13 Mar 2015 19:55:03 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2015 19:55:03 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id C9889E17BF; Fri, 13 Mar 2015 19:55:02 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: alexlevenson@apache.org To: commits@parquet.incubator.apache.org Message-Id: <39f2d6ccf2cd49b8b8f0a226f38b68fd@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: incubator-parquet-mr git commit: PARQUET-217 Use simpler heuristic in MemoryManager Date: Fri, 13 Mar 2015 19:55:02 +0000 (UTC) X-Virus-Checked: Checked by ClamAV on apache.org Repository: incubator-parquet-mr Updated Branches: refs/heads/master 77826fda8 -> 9ee3a1617 PARQUET-217 Use simpler heuristic in MemoryManager We found that the heuristic of throwing when: ``` minMemoryAllocation > 0 && newSize/maxColCount < minMemoryAllocation ``` in MemoryManager is not really valid when you have many (3k +) columns, due to the division by the number of columns. This check throws immediately when writing a single file with a 3GB heap and > 3K columns. This PR introduces a simpler heuristic, which is a min scale, and we throw when the MemoryManager's scale gets too small. By default I chose 25%, but I'm happy to change that to something else. For backwards compatibility I've left the original check in, but it's not executed by default anymore, to get this behavior the min chunk size will have to be set in the hadoop configuration. I'm also open to removing it entirely if we don't think we need it anymore. What do you think? @danielcweeks @rdblue @dongche @julienledem Author: Alex Levenson Closes #143 from isnotinvain/alexlevenson/mem-manager-heuristic and squashes the following commits: acda66f [Alex Levenson] Add units to exception 10237c6 [Alex Levenson] Decouple DEFAULT_MIN_MEMORY_ALLOCATION from DEFAULT_PAGE_SIZE 29c9881 [Alex Levenson] Use an absolute minimum on rowgroup size, only apply when scale < 1 8877125 [Alex Levenson] Merge branch 'master' into alexlevenson/mem-manager-heuristic e5117a0 [Alex Levenson] Merge branch 'master' into alexlevenson/mem-manager-heuristic 6ee5f46 [Alex Levenson] Use simpler heuristic in MemoryManager Project: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/commit/9ee3a161 Tree: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/tree/9ee3a161 Diff: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/diff/9ee3a161 Branch: refs/heads/master Commit: 9ee3a16179cb65f5fe4170257ab7cde558f1dbeb Parents: 77826fd Author: Alex Levenson Authored: Fri Mar 13 12:54:58 2015 -0700 Committer: Alex Levenson Committed: Fri Mar 13 12:54:58 2015 -0700 ---------------------------------------------------------------------- .../src/main/java/parquet/hadoop/MemoryManager.java | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/9ee3a161/parquet-hadoop/src/main/java/parquet/hadoop/MemoryManager.java ---------------------------------------------------------------------- diff --git a/parquet-hadoop/src/main/java/parquet/hadoop/MemoryManager.java b/parquet-hadoop/src/main/java/parquet/hadoop/MemoryManager.java index 7bb0665..9724868 100644 --- a/parquet-hadoop/src/main/java/parquet/hadoop/MemoryManager.java +++ b/parquet-hadoop/src/main/java/parquet/hadoop/MemoryManager.java @@ -40,7 +40,7 @@ import java.util.Map; public class MemoryManager { private static final Log LOG = Log.getLog(MemoryManager.class); static final float DEFAULT_MEMORY_POOL_RATIO = 0.95f; - static final long DEFAULT_MIN_MEMORY_ALLOCATION = ParquetWriter.DEFAULT_PAGE_SIZE; + static final long DEFAULT_MIN_MEMORY_ALLOCATION = 1 * 1024 * 1024; // 1MB private final float memoryPoolRatio; private final long totalMemoryPool; @@ -121,10 +121,10 @@ public class MemoryManager { for (Map.Entry entry : writerList.entrySet()) { long newSize = (long) Math.floor(entry.getValue() * scale); - if(minMemoryAllocation > 0 && newSize/maxColCount < minMemoryAllocation) { - throw new ParquetRuntimeException(String.format("New Memory allocation %d"+ - " exceeds minimum allocation size %d with largest schema having %d columns", - newSize, minMemoryAllocation, maxColCount)){}; + if(scale < 1.0 && minMemoryAllocation > 0 && newSize < minMemoryAllocation) { + throw new ParquetRuntimeException(String.format("New Memory allocation %d bytes" + + " is smaller than the minimum allocation size of %d bytes.", + newSize, minMemoryAllocation)){}; } entry.getKey().setRowGroupSizeThreshold(newSize); LOG.debug(String.format("Adjust block size from %,d to %,d for writer: %s",