hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sankar Sivarama Subramaniyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
Date Thu, 01 Oct 2015 00:22:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939100#comment-14939100
] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11634:
----------------------------------------------------------

[~jcamachorodriguez]  Thanks for the feedback.
1. Changes to groupby_cube1.q do not seem part of this patch?
Thats true, reverted the change in the new patch.

2. In pcs.q.out, query in line 666:
explain extended select a.ds, b.key from pcs_t1 a, pcs_t1 b where struct(a.ds, a.key, b.ds)
in (struct('2000-04-08',1, '2000-04-09'), struct('2000-04-09',2, '2000-04-08'))
Additional predicate is not derived, and thus partition pruning is not happening: we read
partitions '2000-04-08', '2000-04-09', and '2000-04-10'. Any idea why this is happening? Could
you check that case?
I checked this and this seems to happen in case of shuffle join, I am still investigating
this. For map join, this works fine and I have modified the test case accordingly.

3. We still do not seem to be removing the predicates that are used for partition pruning
properly from the Filter predicates e.g. pointlookup2.q.out or pointlookup3.q.out. I think
this patch should take care of that too?

We still do not seem to be removing the predicates that are used for partition pruning properly
from the Filter predicates e.g. pointlookup2.q.out or pointlookup3.q.out. I think this patch
should take care of that too?
Thats true, I debugged this and it goes through the change in PcrExprProcFactory.java I had
introduced which should have removed the extra filter predicates. I am surprised why this
doesnt happen for this particular scenario. Would it be ok to cover this in a follow-up jira
since this is not a regression from the baseline.

4.  we were prepending a new conjunction to the original predicate for non-partition columns
if we were reducing the NDV in the IN clause. Do you think it would be easy to extend your
patch to cover this case too? 

I think this might require some more changes than the initial work since 1. in this current
patch I dont necessarily separate each and every column, I club the partition columns into
the same struct when possible. 2. I need to let the PCR know that this additional predicate
should not be removed if this is a partition column and contributed to reducing the NDV. 

Thanks
Hari

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> ------------------------------------------------------------------
>
>                 Key: HIVE-11634
>                 URL: https://issues.apache.org/jira/browse/HIVE-11634
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, HIVE-11634.3.patch, HIVE-11634.4.patch,
HIVE-11634.5.patch, HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, HIVE-11634.9.patch,
HIVE-11634.91.patch, HIVE-11634.92.patch, HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch,
HIVE-11634.96.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key
< 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key
< 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key
< 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1),
struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are present
in the filter predicate where as we can prune  partition (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN (struct('2000-04-08'),
struct('2000-04-09')) and  struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  is used by
partition pruner to prune the columns which otherwise will not be pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message