Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3E8518721 for ; Fri, 11 Dec 2015 20:47:46 +0000 (UTC) Received: (qmail 77750 invoked by uid 500); 11 Dec 2015 20:47:46 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 77717 invoked by uid 500); 11 Dec 2015 20:47:46 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 77689 invoked by uid 99); 11 Dec 2015 20:47:46 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Dec 2015 20:47:46 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B14B12C1F64 for ; Fri, 11 Dec 2015 20:47:46 +0000 (UTC) Date: Fri, 11 Dec 2015 20:47:46 +0000 (UTC) From: "Rahul Challapalli (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (DRILL-4070) Files written with versions of Drill before v1.3 record metadata that is indistinguishable from bad metadata from other Parquet creators MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli closed DRILL-4070. ------------------------------------ > Files written with versions of Drill before v1.3 record metadata that is indistinguishable from bad metadata from other Parquet creators > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-4070 > URL: https://issues.apache.org/jira/browse/DRILL-4070 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.3.0 > Reporter: Rahul Challapalli > Assignee: Parth Chandra > Priority: Blocker > Fix For: 1.3.0 > > Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz > > > Drill uses the parquet-mr library to write Parquet files. The metadata signature that Drill produced in 1.2 and earlier versions of Drill is indistinguishable from older footers written by other tools (such as Pig and Hive). There was a known bug when those tools wrote metadata that caused the statistics to be incorrect. To correct this, the parquet-mr library adopted a behavior of ignoring statistics from the old form of the Parquet footer. > With 1.3, Drill upgraded to the latest version of parquet-mr and has now started ignoring these statistics as well. This ensures correct result but produces performance regressions (compared to Drill v1 and v2) when querying against partitioned Parquet files generated in Drill 1.1 and 1.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)