Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79DA01730D for ; Wed, 2 Dec 2015 03:47:11 +0000 (UTC) Received: (qmail 76576 invoked by uid 500); 2 Dec 2015 03:47:11 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 76552 invoked by uid 500); 2 Dec 2015 03:47:11 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 76537 invoked by uid 99); 2 Dec 2015 03:47:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2015 03:47:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E5A6B2C14F0 for ; Wed, 2 Dec 2015 03:47:10 +0000 (UTC) Date: Wed, 2 Dec 2015 03:47:10 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4053) Reduce metadata cache file size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035215#comment-15035215 ] ASF GitHub Bot commented on DRILL-4053: --------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/254 > Reduce metadata cache file size > ------------------------------- > > Key: DRILL-4053 > URL: https://issues.apache.org/jira/browse/DRILL-4053 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata > Affects Versions: 1.3.0 > Reporter: Parth Chandra > Assignee: Parth Chandra > Fix For: 1.4.0 > > > The parquet metadata cache file has fair amount of redundant metadata that causes the size of the cache file to bloat. Two things that we can reduce are : > 1) Schema is repeated for every row group. We can keep a merged schema (similar to what was discussed for insert into functionality) 2) The max and min value in the stats are used for partition pruning when the values are the same. We can keep the maxValue only and that too only if it is the same as the minValue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)