impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Date Wed, 02 Nov 2016 05:30:23 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(11 comments)

Still working on a minor test fix, sending out to continue CR.

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from disjunctive CompoundPredicates.
> Might be worth commenting on how this handles > 2 disjuncts. I.e. you need 
Expanded comments on ExprRewriter and modified comment here. 

(I don't think we should document how the framework works in every rule.)


Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule {
> where is this being applied?
At the end of analysis, made those changes (had forgotten this after rebasing/cleaning)


Line 42:     if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr;
> Isn't this check a little weak in the sense that it only checks the root's 
Having separate rules for normalization certainly makes sense, but I see them as being independent
of this change. I don't think we'd change the code/behavior of the current rule even if we
had normalization.

In your example, we would not return 'a' because the single conjunct of child0 would be (a
OR b) which is not common to any of the child1 conjuncts "a" and "c".

Maybe you can come up with another example that breaks?


Line 49:       if (child1Conjuncts.contains(conjunct)) {
> This is n^2 so will get very slow if we have long lists of disjuncts, e.g. 
Agree that this is problematic. Unfortunately, also non-trivial to fix and make sure the hashCode()
and equals() functions are correct.

I added an arbitrary threshold on the maximum number of Expr.equals() comparisons to bail
on these pathological cases.

It's also not clear whether this rule would be much worse than the codegen in the BE. Not
a good argument to avoid the improvement, but the improvement here alone might not make a
material impact for the overall query.


Line 72:     if (!child0Conjuncts.isEmpty()) {
> At this point both child0Conjuncts and child1Conjuncts are non-empty becaus
Right, forgot to correct after some cleanup. Done.


PS1, Line 84: newDisjuncts.size() > 1
> See above - this should always be true.
You are right, thanks. Done.


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
> Slightly tangential, but should we apply De Morgan's first to normalise the
Completely agree about having additional rules for normalization. I view those as somewhat
independent of the current rule. I don't think the code/approach of the current rule would
change much even if we had normalization. Of course, normalizing would make the current rule
useful in more cases.

Let's continue this train of thought and come up with useful normalization rules in a separate
change set. Definitely needed.


Line 153:         "(int_col < 10 and id between 5 and 6) or " +
> Does common conjunct extraction work if you have a BETWEEN expression and t
Good point, there was a bug. Fixed bug and added tests.


http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 3384:       p_brand = 'Brand#12'
> i agree, i think we need normalization as a separate transformation step. t
Agree.


Line 3384:       p_brand = 'Brand#12'
> I'm not really sure how this patch handles a bushy predicate. Do we flatten
Agree we need normalization as a separate set of rules.

I'm not sure normalization is required for your specific example though. The interesting conjuncts
are those that bubble up all the way to the top so that, e.g., we can assign them as scan
predicates. I think the current implementation does that, although some opportunities for
simplifying disjunctive subexpressions are missed.


Line 3393:       p_brand = 'Brand#23'
> additional idea for transformation rule: derive additional, non-essential d
Agree. Sounds like IMPALA-2108.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message