hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sankar Sivarama Subramaniyan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
Date Tue, 25 Aug 2015 19:09:46 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sankar Sivarama Subramaniyan updated HIVE-11634:
-----------------------------------------------------
    Description: 
Currently, we do not support partition pruning for the following scenario
{code}
create table pcr_t1 (key int, value string) partitioned by (ds string);
insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key <
20 order by key;
insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key <
20 order by key;
insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key <
20 order by key;
explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2));
{code}

If we run the above query, we see that all the partitions of table pcr_t1 are present in the
filter predicate where as we can prune  partition (ds='2000-04-10'). 

The optimization is to rewrite the above query into 2 IN clauses one containing partition
columns and the other containing non-partition columns as follows.
{code}
explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2))) and (struct(ds))
IN (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in (struct('2000-04-08',1),
struct('2000-04-09',2));
{code}

This is an extension of the idea presented in HIVE-11573.

  was:
Currently, we do not support partition pruning for the following scenario
{code}
create table pcr_t1 (key int, value string) partitioned by (ds string);
insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key <
20 order by key;
insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key <
20 order by key;
insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key <
20 order by key;
explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2));
{code}

If we run the above query, we see that all the partitions of table pcr_t1 are present in the
filter predicate where as we can prune  partition (ds='2000-04-10'). 

The optimization is to rewrite the above query into 2 IN clauses one containing partition
columns and the other containing non-partition columns as follows.
{code}
explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2))) and (struct(ds))
IN (struct('2000-04-08'), struct('2000-04-09'));
{code}

This is an extension of the idea presented in HIVE-11573.


> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> ------------------------------------------------------------------
>
>                 Key: HIVE-11634
>                 URL: https://issues.apache.org/jira/browse/HIVE-11634
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-11634.1.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key
< 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key
< 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key
< 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1),
struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are present
in the filter predicate where as we can prune  partition (ds='2000-04-10'). 
> The optimization is to rewrite the above query into 2 IN clauses one containing partition
columns and the other containing non-partition columns as follows.
> {code}
> explain extended select ds from pcr_t1 where (struct(key) IN (struct(1), struct(2)))
and (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in (struct('2000-04-08',1),
struct('2000-04-09',2));
> {code}
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message