impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE
Date Wed, 01 Nov 2017 22:08:22 GMT
Alex Behm has posted comments on this change. ( )

Change subject: IMPALA-5976: Remove equivalence class computation in FE

Patch Set 3:

File fe/src/main/java/org/apache/impala/analysis/
PS3, Line 281:     return new FunctionCallExpr("if", ifParams);
Let's please avoid code changes unrelated to the purpose of this patch as much as reasonable.
Cleanup is nice in general, but this patch is already complex and huge so let's not add anything

Such changes can also make backporting more difficult due to conflicts, so cleanup should
be done in a separate change.
File fe/src/main/java/org/apache/impala/analysis/
PS3, Line 91:  * Slot A has value transfer to slot B if a predicate on A can be applied to
B in most
We need a precise definition. The original definition was more precise.
PS3, Line 93:  * It is a reflexive, transitive binary relation on the set of slots.
Not sure this part adds value. What's the significance of these statements with respect to
our use of value transfers for planning purposes? If we don't make use of these terms/properties
anywhere else in the code, then we should remove these statements.
PS3, Line 277:     // The SCC-condensed graph representation of value transfer relationship.
The SCC-condensed graph representation of all slot value transfers.
PS3, Line 278:     SccCondensedGraph valueTransferGraph;
PS3, Line 1146:    * Create and register an auxiliary predicate to express an mutual value
a mutual value transfer
PS3, Line 1467:    * by replacing the slots of a source predicate with slots of the destTid,
if for each
how about: if every source slot has a value transfer to a slot in destTid
PS3, Line 1513:       // its referenced tuples are NULL. For example:
Let's simplify the example and make it as clear as possible:

select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null
PS3, Line 1618:    * For each slot equivalence class, adds/removes predicates from conjuncts
such that it
Need a definition of equivalence class in the Analyzer class comment. You do mention the term
"equivalence class" but I don't think it has the same meaning of the "equivalence class" terminology
used here.
PS3, Line 1636:     // A map from equivalence class ids to slot equivalence classes containing
on slots
Garbled sentence, please clean up
PS3, Line 1956:    * Compute the value transfer graph based on the registered value transfer
the registered value transfers
PS3, Line 1973:         String condensedTc = globalState_.valueTransferGraph.print();
missing space in string
PS3, Line 1982:    * Populate the value transfer relation from eq conjuncts to graph 'g'.
Add value-transfer edges to 'g' based on the registered equi-join conjuncts.
PS3, Line 2059:    * Returns the equivalence class of slotID.
of the given slot id
PS3, Line 2062:   public List<SlotId> getEquivClass(SlotId slot) {
slot -> sid
PS3, Line 2066:     for (int dst: g.sccMembersByVid(slot.asInt())) {
Unfortunate that we have to do this. Should probably clean this up later by avoiding SlotIds
in favor of ints or alternative having a pre-populated SlotId[] so we don't have to create
new SlotId objects all over the place.

Let's leave this for now and keep thinking how to address this.
PS3, Line 2073:    * Returns the src slot's value transfer image set. In other words it returns
ids of all
Simpler to avoid "image set" and just use your second sentence to describe. In general, introducing
a term that needs to be explained in the next sentence is not useful unless the term is really
used very frequently.
PS3, Line 2087:    * Get the id of the equivalence class of a slot. An equivalence class ID
should be used
Second sentence should be part of the equivalence class definition in the Analyzer class comment
PS3, Line 2089:    * @param slotId The slot of interest.
We generally avoid these @param for brevity. I concede they might make sense for public functions,
but this function is private.
File fe/src/main/java/org/apache/impala/analysis/
PS3, Line 136:   public boolean localEquals(Expr that) { return true; }
Missing implementation?
File fe/src/main/java/org/apache/impala/analysis/
PS3, Line 687:   interface ExprBinaryPredicate {
* Confusing name because we have a BinaryPredicate Expr and it's not clear what apply() returns.
* We should make this take two SlotRef arguments and not just any two Exprs. The reason is
that we currently have no use for the general Expr case, and so we should make this as specific
as possible to make the code easier to follow. In general, we should avoid unnecessary generality
for this reason. If we end up needing the generality later we can still refactor (easy change).
* How about calling this SlotRefComparator, seems much clearer to me
* How about calling the function "matches()"
PS3, Line 700:   public boolean similar(Expr that, ExprBinaryPredicate localPredicate) {
I think similar() is confusing, how about matches() or equivalent()
PS3, Line 720:    * Local eq comparator. It returns whether two exprs are equal ignoring their
It returns whether this expr is equal to that ignoring children.
PS3, Line 721:    * It is guaranteed that the two exprs given to this function share the same
Fix comment: Only one expr given to this function
PS3, Line 723:   abstract public boolean localEquals(Expr that);
protected? Doesn't seem like a good public interface to force callers to check that the types
PS3, Line 728:   static final ExprBinaryPredicate localEqPredicate = new ExprBinaryPredicate()
public? Move to the other predicates above. Also LOCAL_EQ_PREDICATE
PS3, Line 784:    * Compute the intersection of l1 and l2, given the smap, and
update comment, should this function live in Analyzer instead?
File fe/src/main/java/org/apache/impala/analysis/
PS3, Line 121:     return !(v == 0) || !(1.0 / v == Double.NEGATIVE_INFINITY);
Let's keep this patch focused on minimize unrelated changes like this one.
PS3, Line 182:         Double d = value_.doubleValue();
Let's keep this patch focused on minimize unrelated changes like this one.

(Please also apply elsewhere)
File fe/src/main/java/org/apache/impala/planner/
PS3, Line 522:       return !requiresIndependentEval(analyticExprs.get(0)) &&
Please revert. This reformatted condition is pretty much incomprehensible to me.
File fe/src/main/java/org/apache/impala/planner/
PS3, Line 399:     boolean partitionedByRight = node.getJoinOp() == RIGHT_OUTER_JOIN ||
I don't understand this change. What does the join op have to do with how this fragment is

If this change is not required, please revert.
File fe/src/main/java/org/apache/impala/planner/
PS3, Line 1377:           if (lhsTblRefIdsHs.contains(analyzer.getSlotDesc(lhsSid).getParent().getId()))
simplify with analyzer.getTupleId(lhsSid)
File fe/src/main/java/org/apache/impala/util/
PS3, Line 31: public abstract class Graph {
The flow of calls/code is somewhat hard to follow for our specific use case. It's not obvious
who calls what and where the main entry point is. Some of the exposed APIs are "generic" in
the sense that they can work with an arbitrary Graph, but in reality we only expect some Graph
types to be used. I think we should try to be more specific about the intended/tested use
of these classes and APIs.

I'll point out specific examples below.
PS3, Line 40:    * Compute the reflexive transitive closure of the graph by BFS from every
mention that BFS is implemented iteratively to avoid unbounded stack usage
PS3, Line 43:   public RandomAccessibleGraph reflexiveTransitiveClosure() {
This can be called on any type of Graph but we don't test that it actually works on all the
Graph types. Instead of testing the irrelevant combinations we should move this call to the
specific Graph type that we use/test in our code.
PS3, Line 49:       IntArrayList queue = new IntArrayList();
How about moving this out of the loop and preallocating it to numVertices(). This way we can
reset() inside the loop and we avoid allocating memory and creating objects inside the loop.
PS3, Line 52:         IntArrayList dstVids = dsts(queue.get(queueFront));
This line in particular is confusing because dsts() is abstract and it's not clear on which
Graph subclass this is intended to be called on. We really never want to call reflexiveTransitiveClosure()
on a SccCondensedGraph because that would be extremely expensive.

One way to resolve this might be to change this function to take an IntArrayList[] adjtList
as an argument and return a new IntArrayList[] which is the tc. I think in general, several
places could be cleaned up by operating on a IntArrayList[] instead of a Graph. That way it's
clear what callers can pass in, and callers can decide what to do with the returned tc (e.g.
create a RandomAccessibleGraph or whatever they want)

Just an idea, I'm certainly open to alternatives.
PS3, Line 53:         for (int j = 0; j < dstVids.size(); j ++) {
style: ++j

Please address this everywhere, there are many instances of this particular style issue. It
is easy to miss these while coding. For me, it often helps to review my own code on gerrit
to flush out the simple issues before passing on to reviewers.
PS3, Line 98:    * Graph supporting adding edges.
Duplicate edges are allowed.
PS3, Line 128:     // Sorted adjacency list.
PS3, Line 132:      * Construct from an sorted adjList.
Construct from a list of sorted adjacency lists.
PS3, Line 150:      * instead of a standalone hash table for the typical use case has less
than 10,000
for the typical -> because the typical
PS3, Line 210:     static SccCondensedGraph fromGraph(Graph g) {
What types of input Graphs are really supported/expected?
PS3, Line 222:     public static SccCondensedGraph condensedReflexiveTransitiveClosure(Graph
This takes a Graph as input. Does it really work on any type of Graph? I don't think we have
tests and I don't think we should have tests. Instead, how about accepting only a WritableGraph.
PS3, Line 223:       SccCondensedGraph condensed = fromGraph(g);
Flow here is confusing because we create several intermediate SccCondensedGraphs. Seems simpler
to inline fromGraph() and avoid the unnecessary intermediate SccCondensedGraph.
PS3, Line 257:      * - DFS preordering indexes. It's the ordering in which vertices are visited.
Imo this doesn't really give the intuition for why the algorithm works.

I really like the "stack invariant" and "bookeeping" paragraphs from here:

Maybe we should just add a link.
PS3, Line 258:      * - Lowlinks. The lowlink of of a vertex is the lowest DFS preordering
lowest -> smallest?

duplicate "of"
PS3, Line 268:     static private Pair<int[], int[][]> tarjanScc(final Graph g) {
What kind of graph is expected here?
PS3, Line 271: v
vertex ids
PS3, Line 273:       // DFS preordering index. -1 indicates unvisited vertex.
A mapping from vertex id to its DFS preordering index. -1 means not visited.

Call this dfsIndexes or dfsIdxs to make the meaning clear
PS3, Line 282:         private int vid;
* final
* remove "private" from these members? seems confusing
PS3, Line 299:       int dfsOrdinal = 0;
int nextDfsIndex = 0;
PS3, Line 313:             // All children have ben searched. Check if this is the root of
an SCC.
typo: been
PS3, Line 359:     static private RandomAccessibleGraph condenseGraphOnScc(Graph g, int[]
What kind of graph is expected here?
File fe/src/main/java/org/apache/impala/util/
PS3, Line 26:   int[] data_;
private, you can add a data() getter
PS3, Line 29:   IntArrayList() {
PS3, Line 33:   IntArrayList(int capacity) {
PS3, Line 44:       System.arraycopy( data_, 0, newData, 0, data_.length );
style: remove spaces after and before paranthesis
PS3, Line 58:    * Remove some elements from the end the of list.
remove "some"
PS3, Line 67:   int size() {
single line where it fits; apply everywhere in this file
PS3, Line 115:   class IntegerIterator implements Iterator<Integer> {
Do we really need this? If not, please remove
File fe/src/test/java/org/apache/impala/util/
PS3, Line 49:   public void testIntArrayList() {
We should be sure to test all public functionality. As far as I can tell these are not tested:

* IntIterator, and IntegerIterator
* set()

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Tianyi Wang <>
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:08:22 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message