drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hao Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3121) Hive partition pruning is not happening
Date Sun, 17 May 2015 19:41:59 GMT
Hao Zhu created DRILL-3121:
------------------------------

             Summary: Hive partition pruning is not happening
                 Key: DRILL-3121
                 URL: https://issues.apache.org/jira/browse/DRILL-3121
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.0.0
            Reporter: Hao Zhu
            Assignee: Chris Westin


Tested on 1.0.0 with below commit id, and hive 0.13.
{code}
>  select * from sys.version;
+-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
|                 commit_id                 |                           commit_message   
                       |        commit_time         | build_email  |         build_time  
      |
+-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: TestImpersonationDisabledWithMiniDFS
fails on Windows  | 15.05.2015 @ 01:18:03 EDT  | Unknown      | 15.05.2015 @ 03:07:10 EDT
 |
+-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (0.083 seconds)
{code}

How to reproduce:
1. Use hive to create below partition table:
{code}
CREATE TABLE partition_table(id INT, username string)
 PARTITIONED BY(year STRING, month STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords
limit 1;
insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords
limit 1;
insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords
limit 1;
insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords
limit 1;
insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords
limit 1;
{code}

2. Hive query can do partition pruning for below 2 queries:
{code}
hive>  explain EXTENDED select * from partition_table where year='2015' and month in (
'02','03') ;
            partition values:
              month 02
              year 2015

            partition values:
              month 03
              year 2015              

explain EXTENDED select * from partition_table where year='2015' and (month >= '02' and
month <= '03') ;
            partition values:
              month 02
              year 2015

            partition values:
              month 03
              year 2015
{code}
Hive only scans 2 partitions -- 2015/02 and 2015/03.

3. Drill can not do partition pruning for below 2 queries:
{code}
> explain plan for select * from hive.partition_table where `year`='2015' and `month` in
('02','03');
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
00-02        SelectionVectorRemover
00-03          Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))])
00-04            Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table),
inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4,
maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4],
columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015,
03])]]])

> explain plan for select * from hive.partition_table where `year`='2015' and (`month`
>= '02' and `month` <= '03' );
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
00-02        SelectionVectorRemover
00-03          Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, '03'))])
00-04            Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table),
inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4,
maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4],
columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015,
03])]]])
{code}
Drill scans 3 partitions -- 2015/01, 2015/02 and 2015/03.

Note: if the inlist only has 1 value, Drill can do partition pruning well:
{code}
>  explain plan for select * from hive.partition_table where `year`='2015' and `month`
in ('02');
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
00-02        Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table),
inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4],
columns=[`*`], partitions= [Partition(values:[2015, 02])]]])
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message