impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyi Wang (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Date Wed, 25 Oct 2017 01:43:48 GMT
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 2:

(46 comments)

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5976: Remove equivalence class computation in FE
> Remove equivalence classes from FE
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2890
PS1, Line 2890: 
> This function was nice for debugging. Does your patch add a similar functio
Added a function Graph.print, called when graph validation fails in testing.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1253
PS1, Line 1253:     e.getIds(tids, null);
> This new formatting is extremely hard to read, please reformat for readabil
Reverted to the original.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1496
PS1, Line 1496:       if (allDestSids.isEmpty()) continue;
> Unless we are absolutely convinced that the change is correct, I think we s
It is changed to checking whether src has value transfer to an an outer joined tuple, as discussed.
The changed tests are also reverted.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1608
PS1, Line 1608:       }
> Refers to "equivalence class" again which is now an undefined term. If you 
Changed the definition of equivalence class at the top of this file.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1626
PS1, Line 1626:    * to cover the known slot equivalences. This function should be called
for join
> What does this mean? Is the Integer key the SCC id?
Yes. Added a comment.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1802
PS1, Line 1802:    * Only contains equivalence classes with more than one member.
> I've seen this check in a couple of places, would be nice if that could be 
The handling is not exactly the same among call sites. For example getValueTransferTargets()
returns a list of a single element, while here it is skipped. I think the graph data structure
shouldn't be aware that there could be out of range slot ids and should fail on out of range
parameters instead of handling them.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1952
PS1, Line 1952:     return false;
> computeValueTransferGraph
Done.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1972
PS1, Line 1972:         String tc = reference.print();
> valueTransfers (plural)
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2047
PS1, Line 2047:           || tblRef.getJoinOp() == JoinOperator.LEFT_ANTI_JOIN
> Comment need updating. The concept of "equivalence class" does not exist an
Equivalence class is now defined at the top of this file.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2055
PS1, Line 2055:   }
> In terms of the g API I think callers should have to care about the existen
Removed the getSccId -> getSccMembers indirection. Do you suggest removing the SCC terminology
from the graph interfaces?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2070
PS1, Line 2070:   }
> Shouldn't the destinations already be sorted?
Not in current implementation since CondensedTransitiveClosure.getDests() assembles its result
on the fly and the result is not sorted there. I haven't figured out why the call site of
this function rely on the ordering. The current state is that removing this sort fails some
tests (Not different plan. Some runtime filters disappeared.)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@702
PS1, Line 702:     if (this instanceof CastExpr && ((CastExpr)this).isImplicit())
{
> It's very unfortunate that we have to create a new pair object to do a Slot
Done. An interface ExprBinaryPredicate is added.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@1
PS1, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> Apache header
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@18
PS1, Line 18: package org.apache.impala.util;
> Comment on the ordering of elements (they are unordered correct?)
It was unordered. Since in the new patch binary search is used, it diverges into ordered and
unordered types. Comments are added accordingly.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@19
PS1, Line 19: 
> I feel like this should really be int[][] for maximum efficiency. It genera
Implemented util.IntArrayList.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@23
PS1, Line 23: 
> style: ++i
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@36
PS1, Line 36:    */
> Not sure how useful these Preconditions checks are. We'll even get a more m
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@45
PS1, Line 45:     BitSet visited = new BitSet(numVertices());
> Does this really need to be public?
No but should the visibility be as strict as possible?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@46
PS1, Line 46:     for (int srcVid = 0; srcVid < numVertices(); ++srcVid) {
> Don't think this is needed.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@73
PS1, Line 73:       sb.append(i).append(" => ");
> I'm worried this might blow the stack for huge graphs - this would be a reg
Reimplemented in BFS.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@86
PS1, Line 86:   public static boolean validate(Graph a, Graph b) {
> Any idea how this compares to the previous transitive closure implementatio
In general this is O(V(V+E)) while the old one is O(V^3). The old one might be faster on a
really dense graph because of the locality of looping/matrix representation. For sparse graphs
this should be faster.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@89
PS1, Line 89:       if (!Ordering.natural().sortedCopy(a.dsts(i)).equals(
> Create outside of loop and clear() within loop to avoid generating objects 
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@107
PS1, Line 107:     }
> remove "belonging" the "its" already implies that
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@109
PS1, Line 109:     @Override
> (a list of original vertex ids)
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@110
PS1, Line 110:     public int numVertices() {
> Can this be an int[][] for the same reasons stated above?
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@114
PS1, Line 114:     @Override
> Does the HashSet really buy us much versus doing a binary search on the sor
The newest patch uses binary search. I did some experiments. At 100,000 scale HastSet<Integer>
is 1.6X faster than binary search but uses 13X memory. If neither is satisfying we may consider
primitive int hastset as well.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@134
PS1, Line 134:     private RandomAccessibleGraph(IntArrayList[] adjList) {
> Remove these Preconditions checks - they don't seem useful to me.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@144
PS1, Line 144:     public IntArrayList dsts(int srcVid) {
> Remove
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@156
PS1, Line 156:           dstVid);
> Remove
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@180
PS1, Line 180: 
> Please avoid chaining multiple non-trivial function calls because that make
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@195
PS1, Line 195:       return result;
> ... using Tarjan's algorithm.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@201
PS1, Line 201:      */
> sccIds
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@204
PS1, Line 204:     }
> Needs a better variable name. Not clear what "index" means; "indexes" is be
It is the order in which vertices are visited. It's called "NUMBER" in Tarjan's original paper.
"DFS preordering index" should be more appropriate. The comments are updated and the variable
is renamed into "indexes".


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@205
PS1, Line 205: 
> lowLinks
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@209
PS1, Line 209:      */
> Needs a better var name, what is this counting? Is this something like visi
Renamed into dfsOrdinal.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS1, Line 223:       SccCondensedGraph condensed = fromGraph(g);
> I think we should use vertexId or vid or similar consistently.
All the variable naming is changed to .*vid.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@227
PS1, Line 227: 
> Brief description should also be above L205
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@237
PS1, Line 237:      * Get an array of vertices ids in the scc. The caller shouldn't modify
the returned
> int[] sccId
Done.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@239
PS1, Line 239:      * Time complexity: O(1)
> index[vertex] = lowLink[vertex] = indexCounter;
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@243
PS1, Line 243:     }
> A few simple comments here and there would make this code easier to follow.
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@244
PS1, Line 244: 
> I'm worried that recursion might blow the stack and lead to a regression in
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@252
PS1, Line 252:     }
> Brief comment about this case/code, e.g. "Create an SCC from all elements o
Done


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@253
PS1, Line 253: 
> use while()
The for loop no longer exists.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@265
PS1, Line 265:      * @param g The input graph.
> Project the original graph 'g' to a new graph in SCC space.
Done. (The term "project" is removed since it basically means the same thing as "condense").


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/util/Graph.java@274
PS1, Line 274:       int[] indexes = new int[g.numVertices()];
> Instead of having a separate normalize(), can't you remove duplicates right
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Oct 2017 01:43:48 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message