impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyi Wang (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Date Fri, 03 Nov 2017 01:36:08 GMT
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
......................................................................


Patch Set 4:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281: 
> Let's please avoid code changes unrelated to the purpose of this patch as m
Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> We need a precise definition. The original definition was more precise.
I think If the predicate transfer is the starting point to consider whether there is a value
transfer between slots, it should be defined as so. The original sentence "is computed based
on..." is not really a definition to me.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  *
> Not sure this part adds value. What's the significance of these statements 
Removed. I intended to use these statements to show that the symmetric closure is an equivalence
relation.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277:     // Tracks all warnings (e.g. non-fatal errors) that were generated during
analysis.
> The SCC-condensed graph representation of all slot value transfers.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278:     // These are passed to the backend and eventually propagated to the shell.
Maps from
> public/private?
Done. private.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:           conjunctIds.add(e.getId());
> a mutual value transfer
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:     List<TupleId> tids = Lists.newArrayList();
> how about: if every source slot has a value transfer to a slot in destTid
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:       boolean hasOuterJoinedTuple = false;
> Let's simplify the example and make it as clear as possible:
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:         // check if we already created this predicate
> Need a definition of equivalence class in the Analyzer class comment. You d
I think they are the same.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636:    * equivalent slot sets of lhsTids and rhsTids.
> Garbled sentence, please clean up
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:   public boolean isVisible(TupleId tid) {
> the registered value transfers
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973:         new WritableGraph(globalState_.descTbl.getMaxSlotId().asInt() + 1);
> missing space in string
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:       RandomAccessibleGraph reference =
> Add value-transfer edges to 'g' based on the registered equi-join conjuncts
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:       if (tblRef.getJoinOp() == JoinOperator.LEFT_OUTER_JOIN
> of the given slot id
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:         g.addEdge(outerSlot.asInt(), innerSlot.asInt());
> slot -> sid
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2066
PS3, Line 2066:       }
> Unfortunate that we have to do this. Should probably clean this up later by
Ack


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2073
PS3, Line 2073:    * Time complexity: O(V) where V = number of slots
> Simpler to avoid "image set" and just use your second sentence to describe.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2087
PS3, Line 2087:    * Time complexity: O(V) where V = number of slots
> Second sentence should be part of the equivalence class definition in the A
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2089
PS3, Line 2089:   public List<SlotId> getValueTransferTargets(SlotId srcSid) {
> We generally avoid these @param for brevity. I concede they might make sens
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@136
PS3, Line 136:   public boolean localEquals(Expr that) { return true; }
> Missing implementation?
localEquals only checks members defined in this subclass. ArithmeticExprs' node-local information
is fn_, which is defined and compared in Expr.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@687
PS3, Line 687:     return Joiner.on(", ").join(strings);
> * Confusing name because we have a BinaryPredicate Expr and it's not clear 
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS3, Line 700:    * 3. For every pair of corresponding non-SlotRef exprs, localEquals returns
true.
> I think similar() is confusing, how about matches() or equivalent()
Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@720
PS3, Line 720:     if (fn_ == null || that.fn_ == null) return false; // One null, one not
> It returns whether this expr is equal to that ignoring children.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@721
PS3, Line 721:     // Both fn_'s are not null
> Fix comment: Only one expr given to this function
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@723
PS3, Line 723:   }
> protected? Doesn't seem like a good public interface to force callers to ch
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@728
PS3, Line 728:    * Caller should guarantee that 'this' and 'that' have the same type.
> public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE
Changed to SlotrefComparator and moved into SlotRef.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@784
PS3, Line 784:     if (id_ == null) {
> update comment, should this function live in Analyzer instead?
Comment is updated and it's moved to Analyzer.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@121
PS3, Line 121:     if (v == 0 && 1.0 / v == Double.NEGATIVE_INFINITY) return false;
> Let's keep this patch focused on minimize unrelated changes like this one.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@182
PS3, Line 182:       if (type_ == null) {
> Let's keep this patch focused on minimize unrelated changes like this one.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@522
PS3, Line 522:     public boolean isCompatible(AnalyticExpr analyticExpr) {
> Please revert. This reformatted condition is pretty much incomprehensible t
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@399
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
> I don't understand this change. What does the join op have to do with how t
This is the output partitioning of this fragment. If it is a right outer join then the output
is partitioned by the rhs join exprs, not lhs because the nulls in lhs can be in any partition.
The partitioning compatibility check doesn't use equivalence class anymore and is wrong without
this.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1377
PS3, Line 1377:       TableRef rhsTblRef = analyzer.getTableRef(rhsId);
> simplify with analyzer.getTupleId(lhsSid)
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java
File fe/src/main/java/org/apache/impala/util/Graph.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@31
PS3, Line 31: 
> The flow of calls/code is somewhat hard to follow for our specific use case
Specialized most of them except for reflexiveTransitiveClosure(). It is called with different
types for validation purpose.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@40
PS3, Line 40:       sb.append(i).append(" => ");
> mention that BFS is implemented iteratively to avoid unbounded stack usage
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@43
PS3, Line 43:       }
> This can be called on any type of Graph but we don't test that it actually 
It is called on RandomAccessibleGraph and WritableGraph. The former is called in condensedReflexiveTransitiveClosure
and the latter is called in Analyzer.computeValueTransferGraph to generate a reference for
graph validation. Maybe use a precondition check to exclude SccCondensedGraph?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@49
PS3, Line 49:   /**
> How about moving this out of the loop and preallocating it to numVertices()
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@52
PS3, Line 52:    */
> This line in particular is confusing because dsts() is abstract and it's no
Yes, reflexiveTransitiveClosure.dsts() is expensive. But I think this is more of a problem
with the dsts interface. dsts() is not a good API because:
1. SccCondensedGraph.dsts() constructs a new array every time it's called.
2. The return value of SccCondensedGraph.dsts is owned by the caller. On the other hand the
return value of RandomAccessibleGraph.dsts() and WritableGraph.dsts() is read only. There
isn't a good way to enforce the read-only property.
3. RandomAccessibleGraph just needs an int[] rather than an IntArrayList. But dsts() forces
it to use the latter.

I changed dsts() interface to returning an iterator. In this way SccCondensedGraph.dsts()
doesn't need to "decompress" the result and the caller cannot modify the data in Graphs. Does
this solve this problem?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@53
PS3, Line 53:   public RandomAccessibleGraph reflexiveTransitiveClosure() {
> style: ++j
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@98
PS3, Line 98:     public void addEdge(int srcVid, int dstVid) {
> Duplicate edges are allowed.
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@128
PS3, Line 128:       int idx = Arrays.binarySearch(adjList_[srcVid], dstVid);
> lists
I just checked Wikipedia and a lecture note. An adjacency list is a list of lists. According
to this, "sorted adjacency list" might be vague. I added some comments there.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@132
PS3, Line 132: 
> Construct from a list of sorted adjacency lists.
Reworded a little. But an adjacency list is a list of lists.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@150
PS3, Line 150:     }
> for the typical -> because the typical
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@210
PS3, Line 210:      * Get the ID of the strongly connected component containing 'vid'.
> What types of input Graphs are really supported/expected?
This function is merged into condensedReflexiveTransitiveClosure(). See below.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@222
PS3, Line 222:     /**
> This takes a Graph as input. Does it really work on any type of Graph? I do
Changed to WritableGraph.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@223
PS3, Line 223:      * Get an array of vertices IDs in the scc. The caller shouldn't modify
the returned
> Flow here is confusing because we create several intermediate SccCondensedG
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@257
PS3, Line 257:         // SCC as vid.
> Imo this doesn't really give the intuition for why the algorithm works.
Changed to a link.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@258
PS3, Line 258:         int unAssignedStackPos;
> lowest -> smallest?
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@268
PS3, Line 268:       int nextDfsIndex = 0;
> What kind of graph is expected here?
WritableGraph. Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@271
PS3, Line 271: u
> vertex ids
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@273
PS3, Line 273:           DfsContext ctx = dfsStack.peek();
> A mapping from vertex id to its DFS preordering index. -1 means not visited
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@282
PS3, Line 282:             // All successors have been searched. Check if this is the root
of an SCC.
> * final
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@299
PS3, Line 299:               // Tree edge. DFS this successor. ctx.dstIt is not advanced until
the DFS
> int nextDfsIndex = 0;
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@313
PS3, Line 313:               ctx.dstIt.next();
> typo: been
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/Graph.java@359
PS3, Line 359:         while (refIt.hasNext() || condensedIt.hasNext()) {
> What kind of graph is expected here?
WritableGraph. Done.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java
File fe/src/main/java/org/apache/impala/util/IntArrayList.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@26
PS3, Line 26: public class IntArrayList {
> private, you can add a data() getter
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@29
PS3, Line 29: 
> public
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@33
PS3, Line 33:     data_ = new int[capacity];
> public
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@44
PS3, Line 44:       int[] newData = new int[Math.max(data_.length * 2, capacity)];
> style: remove spaces after and before paranthesis
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@58
PS3, Line 58:   /**
> remove "some"
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@67
PS3, Line 67: 
> single line where it fits; apply everywhere in this file
Done


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/util/IntArrayList.java@115
PS3, Line 115: 
> Do we really need this? If not, please remove
Removed. It was used to enable Collections.sort() in Graph.validate().


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java
File fe/src/test/java/org/apache/impala/util/IntArrayListTest.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/test/java/org/apache/impala/util/IntArrayListTest.java@49
PS3, Line 49:       fail();
> We should be sure to test all public functionality. As far as I can tell th
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 01:36:08 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message