Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC86D18DAC for ; Thu, 9 Jul 2015 20:51:04 +0000 (UTC) Received: (qmail 57731 invoked by uid 500); 9 Jul 2015 20:51:04 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 57702 invoked by uid 500); 9 Jul 2015 20:51:04 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 57692 invoked by uid 99); 9 Jul 2015 20:51:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jul 2015 20:51:04 +0000 Date: Thu, 9 Jul 2015 20:51:04 +0000 (UTC) From: "Mehant Baid (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-3121) Hive partition pruning is not happening MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3121: ------------------------------- Issue Type: Improvement (was: Bug) > Hive partition pruning is not happening > --------------------------------------- > > Key: DRILL-3121 > URL: https://issues.apache.org/jira/browse/DRILL-3121 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow > Affects Versions: 1.0.0 > Reporter: Hao Zhu > Assignee: Mehant Baid > Fix For: 1.2.0 > > > Tested on 1.0.0 with below commit id, and hive 0.13. > {code} > > select * from sys.version; > +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+ > | commit_id | commit_message | commit_time | build_email | build_time | > +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+ > | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | > +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+ > 1 row selected (0.083 seconds) > {code} > How to reproduce: > 1. Use hive to create below partition table: > {code} > CREATE TABLE partition_table(id INT, username string) > PARTITIONED BY(year STRING, month STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; > insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; > insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; > insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; > insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; > insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1; > {code} > 2. Hive query can do partition pruning for below 2 queries: > {code} > hive> explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; > partition values: > month 02 > year 2015 > partition values: > month 03 > year 2015 > explain EXTENDED select * from partition_table where year='2015' and (month >= '02' and month <= '03') ; > partition values: > month 02 > year 2015 > partition values: > month 03 > year 2015 > {code} > Hive only scans 2 partitions -- 2015/02 and 2015/03. > 3. Drill can not do partition pruning for below 2 queries: > {code} > > explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03'); > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) > 00-02 SelectionVectorRemover > 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) > 00-04 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) > > explain plan for select * from hive.partition_table where `year`='2015' and (`month` >= '02' and `month` <= '03' ); > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) > 00-02 SelectionVectorRemover > 00-03 Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, '03'))]) > 00-04 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) > {code} > Drill scans 3 partitions -- 2015/01, 2015/02 and 2015/03. > Note: if the inlist only has 1 value, Drill can do partition pruning well: > {code} > > explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02'); > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) > 00-02 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 02])]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)