impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Date Tue, 14 Nov 2017 01:26:29 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 4:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> I think If the predicate transfer is the starting point to consider whether
Fair point. I was thinking of a definition like this:

Slot A has a value transfer to slot B if for all rows containing both slots slot B has the
same value as slot A or the tuple containing slot B is NULL.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102
PS4, Line 102:  * Slot value transfer:
Slot value transfers:


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105
PS4, Line 105:  * Its symmetric closure is a equivalence relation. Its equivalence class is
called slot
What does "Its" refer to here?

I think it's easier to state less formally:

Each slot is contained in exactly one equivalence class. A slot equivalence class is a set
of slots where each pair of slots has a mutual value transfer. Equivalence classes are assigned
an arbitrary id to distinguish them from another.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527
PS4, Line 1527:       // select * from (select A.a, B.b from A left join B on A.a=B.b) v where
b is null
to drive it home even further use "B.col is null" as the predicate to show that the NULL-checking
predicate is unrelated to the slots participating in the equivalence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648
PS4, Line 1648:     // A map from equivalence class IDs to equivalence classese. The equivalence
classes
typo: classes


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813
PS4, Line 1813:    * Returns a map of slot equivalence classes on the set of slots in given
tuples.
the given tuples


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838
PS4, Line 1838:    * propagation. Each mapping assigns every slot in srcSids to slot in destTid
which has
Each mapping assigns every slot in srcSids to a slot in destTid which has a value transfer
from srcSid.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS4, Line 1989:             + "\n" + tc + "Condensed Graph:\n" + condensedTc);
move the first "\n" to previous line


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998
PS4, Line 1998:     // transform equality predicates into a transfer graph
remove (pretty clear from function comment)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691
PS4, Line 691:   interface SlotRefComparator {
Move this into SlotRef since it's SlotRef specific?


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696
PS4, Line 696:    * Returns if this expr matches 'that'. Two exprs match if:
Returns true if this expr matches 'that'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699
PS4, Line 699:    * 2. For every pair of corresponding SlotRef, slotRefCmp.matches returns
true.
For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS4, Line 700:    * 3. For every pair of corresponding non-SlotRef exprs, localEquals returns
true.
localEquals()

(we generally use () for function names to make it clear we are referring to a function)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719
PS4, Line 719:     if (fn_ == null && that.fn_ == null) return true;
I think we should separate matches() and localEquals() more cleanly. I think localEquals()
should be non-abstract and have a default implementation that checks the type and fn_ of this
and that. Subclasses can override and call super.localEquals() first.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@727
PS4, Line 727:    * children and fn_.
Not checking fn_ seems a little strange. The function semantics would be cleaner if localEquals()
compared this and that without children. Following my above suggestion will also avoid the
mysterious "return true" implementations of localEquals() in a few places.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@962
PS4, Line 962:    * Return a new list without duplicate exprs (according to cmp).
according to matches() using 'cmp'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS4, Line 198:    * A wrapper around localEquals checking type equality, used for
A wrapper around localEquals().

(no need to explain further, the "checking type equality" is more confusing than helpful)


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
> This is the output partitioning of this fragment. If it is a right outer jo
Makes sense, please add a comment to explain.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:    */
> Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is mo
This addresses the performance issue, but it does not address the testing/readability issue.
If we expose a generic API like this on every subclass of Graph then this signals to the reader
that it is meaningful to call this function on every subclass and therefore we need to test
that all variants actually work.

We "could" add tests that show SccCondensedGraph.reflexiveTransitiveClosure() works correctly.
But that is wasted effort and unnecessary code because we never intend/require reflexiveTransitiveClosure()
to be called on an SccCondensedGraph. So the bigger issue to me is the "unnecessary abstraction"
which adds cognitive burden to readers.

Here's a proposal to solve this issue.
* I believe we need reflexiveTransitiveClosure() on WritableGraph for validate() and and RandomAccessibleGraph
for condensedReflexiveTransitiveClosure()
* How about we make reflexiveTransitiveClosure() a member of RandomAccessibleGraph or we make
reflexiveTransitiveClosure() accept a RandomAccessibleGraph or something that makes the call
specific RandomAccessibleGraph
* To address the validate() use case we can add a simple function that converts a WritableGraph
to a RandomAccessibleGraph

This way we don't expose unnecessary generality and our existing tests cover all the code.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@91
PS4, Line 91:     public int numVertices() { return adjList_.length; }
@Override


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@106
PS4, Line 106:     // Adjacency list. Each in list the adjacency list is sorted.
Don't understand the second sentence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS4, Line 134:    * An strongly-connected-component(SCC)-condensed graph. Vertices are mapped
to their
A graph condensed by its strongly-connected components (SCC). Vertices are mapped to their
SCCs and an inner graph on the SCCs is stored.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@152
PS4, Line 152:     public int numVertices() { return sccIds_.length; }
@Override


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/util/Graph.java@199
PS4, Line 199:     public static SccCondensedGraph condensedReflexiveTransitiveClosure(WritableGraph
g) {
Very clear now! Thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:26:29 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message