impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Date Fri, 28 Oct 2016 15:59:56 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from disjunctive CompoundPredicates.
Might be worth commenting on how this handles > 2 disjuncts. I.e. you need to apply the
rule from the bottom-up.


Line 49:       if (child1Conjuncts.contains(conjunct)) {
This is n^2 so will get very slow if we have long lists of disjuncts, e.g. machine-generated
queries. This is fine for small n but I think we should seriously consider switching to HashMaps
for child1Conjuncts and commonConjuncts for n > some threshold. The core logic I think
is the same if you factor it out into a function that operates on Collection<Expr>.


Line 72:     if (!child0Conjuncts.isEmpty()) {
At this point both child0Conjuncts and child1Conjuncts are non-empty because we would have
returned early otherwise. So I think these branches are unnecessary.


PS1, Line 84: newDisjuncts.size() > 1
See above - this should always be true.


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
Slightly tangential, but should we apply De Morgan's first to normalise the disjunct into
a conjunct? That would help if we had something like:

  (!(int_col=5 or tinyint_col > 9) and double_col = 7) or (!(int_col = 5) and !(tinyint_col
> 9) and double_col = 8)

I guess in general it would be nice to do some basic normalisation of expressions in other
cases, e.g. convert !(tinyint_col > 9) to tinyint_col <= 9, convert 9 < tinyint_col
to tinyint_col <= 9, etc.

I think that would make the common conjunct extraction a bit more robust. You can rewrite
the query but avoiding that is the motivation for this in the first place, so it would be
nice if it worked in some more simple cases.


Line 153:         "(int_col < 10 and id between 5 and 6) or " +
Does common conjunct extraction work if you have a BETWEEN expression and the equivalent expression
after the rewrite? Should work if we apply the extraction after the BETWEEN rewrite, but would
be nice to test.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message