impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyi Wang (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Date Wed, 25 Oct 2017 01:43:48 GMT
Tianyi Wang has posted comments on this change. ( )

Change subject: IMPALA-5976: Remove equivalence class computation in FE

Patch Set 2:

Commit Message:
PS1, Line 7: IMPALA-5976: Remove equivalence class computation in FE
> Remove equivalence classes from FE
File fe/src/main/java/org/apache/impala/analysis/
PS1, Line 2890: 
> This function was nice for debugging. Does your patch add a similar functio
Added a function Graph.print, called when graph validation fails in testing.
PS1, Line 1253:     e.getIds(tids, null);
> This new formatting is extremely hard to read, please reformat for readabil
Reverted to the original.
PS1, Line 1496:       if (allDestSids.isEmpty()) continue;
> Unless we are absolutely convinced that the change is correct, I think we s
It is changed to checking whether src has value transfer to an an outer joined tuple, as discussed.
The changed tests are also reverted.
PS1, Line 1608:       }
> Refers to "equivalence class" again which is now an undefined term. If you 
Changed the definition of equivalence class at the top of this file.
PS1, Line 1626:    * to cover the known slot equivalences. This function should be called
for join
> What does this mean? Is the Integer key the SCC id?
Yes. Added a comment.
PS1, Line 1802:    * Only contains equivalence classes with more than one member.
> I've seen this check in a couple of places, would be nice if that could be 
The handling is not exactly the same among call sites. For example getValueTransferTargets()
returns a list of a single element, while here it is skipped. I think the graph data structure
shouldn't be aware that there could be out of range slot ids and should fail on out of range
parameters instead of handling them.
PS1, Line 1952:     return false;
> computeValueTransferGraph
PS1, Line 1972:         String tc = reference.print();
> valueTransfers (plural)
PS1, Line 2047:           || tblRef.getJoinOp() == JoinOperator.LEFT_ANTI_JOIN
> Comment need updating. The concept of "equivalence class" does not exist an
Equivalence class is now defined at the top of this file.
PS1, Line 2055:   }
> In terms of the g API I think callers should have to care about the existen
Removed the getSccId -> getSccMembers indirection. Do you suggest removing the SCC terminology
from the graph interfaces?
PS1, Line 2070:   }
> Shouldn't the destinations already be sorted?
Not in current implementation since CondensedTransitiveClosure.getDests() assembles its result
on the fly and the result is not sorted there. I haven't figured out why the call site of
this function rely on the ordering. The current state is that removing this sort fails some
tests (Not different plan. Some runtime filters disappeared.)
File fe/src/main/java/org/apache/impala/analysis/
PS1, Line 702:     if (this instanceof CastExpr && ((CastExpr)this).isImplicit())
> It's very unfortunate that we have to create a new pair object to do a Slot
Done. An interface ExprBinaryPredicate is added.
File fe/src/main/java/org/apache/impala/util/
PS1, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> Apache header
PS1, Line 18: package org.apache.impala.util;
> Comment on the ordering of elements (they are unordered correct?)
It was unordered. Since in the new patch binary search is used, it diverges into ordered and
unordered types. Comments are added accordingly.
PS1, Line 19: 
> I feel like this should really be int[][] for maximum efficiency. It genera
Implemented util.IntArrayList.
PS1, Line 23: 
> style: ++i
PS1, Line 36:    */
> Not sure how useful these Preconditions checks are. We'll even get a more m
PS1, Line 45:     BitSet visited = new BitSet(numVertices());
> Does this really need to be public?
No but should the visibility be as strict as possible?
PS1, Line 46:     for (int srcVid = 0; srcVid < numVertices(); ++srcVid) {
> Don't think this is needed.
PS1, Line 73:       sb.append(i).append(" => ");
> I'm worried this might blow the stack for huge graphs - this would be a reg
Reimplemented in BFS.
PS1, Line 86:   public static boolean validate(Graph a, Graph b) {
> Any idea how this compares to the previous transitive closure implementatio
In general this is O(V(V+E)) while the old one is O(V^3). The old one might be faster on a
really dense graph because of the locality of looping/matrix representation. For sparse graphs
this should be faster.
PS1, Line 89:       if (!Ordering.natural().sortedCopy(a.dsts(i)).equals(
> Create outside of loop and clear() within loop to avoid generating objects 
PS1, Line 107:     }
> remove "belonging" the "its" already implies that
PS1, Line 109:     @Override
> (a list of original vertex ids)
PS1, Line 110:     public int numVertices() {
> Can this be an int[][] for the same reasons stated above?
PS1, Line 114:     @Override
> Does the HashSet really buy us much versus doing a binary search on the sor
The newest patch uses binary search. I did some experiments. At 100,000 scale HastSet<Integer>
is 1.6X faster than binary search but uses 13X memory. If neither is satisfying we may consider
primitive int hastset as well.
PS1, Line 134:     private RandomAccessibleGraph(IntArrayList[] adjList) {
> Remove these Preconditions checks - they don't seem useful to me.
PS1, Line 144:     public IntArrayList dsts(int srcVid) {
> Remove
PS1, Line 156:           dstVid);
> Remove
PS1, Line 180: 
> Please avoid chaining multiple non-trivial function calls because that make
PS1, Line 195:       return result;
> ... using Tarjan's algorithm.
PS1, Line 201:      */
> sccIds
PS1, Line 204:     }
> Needs a better variable name. Not clear what "index" means; "indexes" is be
It is the order in which vertices are visited. It's called "NUMBER" in Tarjan's original paper.
"DFS preordering index" should be more appropriate. The comments are updated and the variable
is renamed into "indexes".
PS1, Line 205: 
> lowLinks
PS1, Line 209:      */
> Needs a better var name, what is this counting? Is this something like visi
Renamed into dfsOrdinal.
PS1, Line 223:       SccCondensedGraph condensed = fromGraph(g);
> I think we should use vertexId or vid or similar consistently.
All the variable naming is changed to .*vid.
PS1, Line 227: 
> Brief description should also be above L205
PS1, Line 237:      * Get an array of vertices ids in the scc. The caller shouldn't modify
the returned
> int[] sccId
PS1, Line 239:      * Time complexity: O(1)
> index[vertex] = lowLink[vertex] = indexCounter;
PS1, Line 243:     }
> A few simple comments here and there would make this code easier to follow.
PS1, Line 244: 
> I'm worried that recursion might blow the stack and lead to a regression in
PS1, Line 252:     }
> Brief comment about this case/code, e.g. "Create an SCC from all elements o
PS1, Line 253: 
> use while()
The for loop no longer exists.
PS1, Line 265:      * @param g The input graph.
> Project the original graph 'g' to a new graph in SCC space.
Done. (The term "project" is removed since it basically means the same thing as "condense").
PS1, Line 274:       int[] indexes = new int[g.numVertices()];
> Instead of having a separate normalize(), can't you remove duplicates right

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Tianyi Wang <>
Gerrit-Comment-Date: Wed, 25 Oct 2017 01:43:48 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message