hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14652) incorrect results for not in on partition columns
Date Fri, 26 Aug 2016 06:40:21 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438552#comment-15438552
] 

Jesus Camacho Rodriguez commented on HIVE-14652:
------------------------------------------------

Thanks for looking into this [~sershe].

The problem seemed to be there for IN clauses before HIVE-11424 went in, which just added
the case for single column. In fact, as you said, it is expected that logic for multi-column
(struct) IN clause is broken too.

I think the source of the problem is in the assumption for the IN logic about the WalkState,
as it considers that TRUE means that the condition can be removed (comment in line 423 in
the original code, line 359 after applying your patch). WalkState seems to be a global overview
on the results of the children expressions, thus that assumption is not correct.

I checked the patch and changes look good to me, but I have a couple of questions.
1. Does the patch still consider the dynamic partition pruner generated synthetic predicates
for IN clause with a single column? Previously there was some special handling for this case,
but it does not seem to be there anymore. Maybe it is handled generically as any other predicate?
2. I would extend the patch to cover multi-column IN clauses so we fix all the issues. That
would mean extending the logic in lines 359-364 after applying your patch (it seems straightforward),
and adding an additional test case.

--

Concerning the logic behind pcr. If I understand your question correctly, the answer is that
we need to evaluate them because partition pruning does not necessarily correspond to the
filter condition. For instance, consider a table with partition column _b_, and the given
predicate _(a = 5 and b = 1) or (a=3 and b=2)_. We can infer that we only need partitions
_b=1_ and _b=2_. However, we cannot remove any part of the predicate if both partitions exist.
In turn, if only _b=1_ exists, then final predicate would be _a=5_.

Btw, we had some discussion with [~ashutoshc] about moving pcr to the logical optimization
phase (Calcite), but till the return path is in place, we cannot complete this task.

> incorrect results for not in on partition columns
> -------------------------------------------------
>
>                 Key: HIVE-14652
>                 URL: https://issues.apache.org/jira/browse/HIVE-14652
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: stephen sprague
>            Assignee: Sergey Shelukhin
>            Priority: Blocker
>         Attachments: HIVE-14652.patch
>
>
> {noformat}
> create table foo (i int) partitioned by (s string);
> insert overwrite table foo partition(s='foo') select cint from alltypesorc limit 10;
> insert overwrite table foo partition(s='bar') select cint from alltypesorc limit 10;
> select * from foo where s not in ('bar');
> {noformat}
> No results. IN ... works correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message