impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Date Thu, 09 Mar 2017 01:03:26 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 1:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 212:   // True if this Expr is known to have a deterministic value, false if it is
True if this exprs always returns the same value given the same input. False if ...


Line 213:   // non-determinstic or unknown (eg. UDFs). Set at the end of analyze() and valid
if
e.g.


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 292:     // We currently don't have a way to indicate if a UDF is deterministic, so just
always
Ideally we should not conflate the UDF and deterministic concepts. For example, one might
reasonably think that FunctionCallExpr.isNondeterministicBuiltinFn() should return the opposite
of as Expr.isDeterministic() - but they don't and that could lead to confusion.

I suggest having two separate members.

We can revisit the treatment of UDFs with respect to determinism sometime later.


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

Line 253:       if (!(smap.getLhs().get(i) instanceof SlotRef)
use {} for multi-line if


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 39:   // TODO: run performance tests to determine this value.
Remove TODO, instead comment where this value came from, maybe with some examples of exprs
that cost 5, also mention what the effect of this tuning parameter is (i.e. all ordering exprs
with cost > this will be materialized)


Line 46:   private List<Expr> materializedOrderingExprs_;
needs a brief comment, in particular why we need to store them


Line 172:     TreeNode.collect(orderingExprs_, Predicates.instanceOf(SlotRef.class), sourceSlots);
We should collect the SlotRefs only for non-materialized ordering exprs.


Line 180:     // substOrderBy is the mapping from slot refs in the sort node's input to slot
refs in
update comment to reflect the new contents of the substOrderBy smap


Line 196:     // The ordering exprs still point to the old slot refs and need to be replaced
with
update comment, we are not only substituting slot refs


Line 208:    * SlotRefs to 'sortTupleExprs'.
Say something about the args of this function. Also mention side-effects (populates materializedOrderingExprs_)


Line 210:    * Materialize ordering exprs that are non-deterministic, are more expensive than
a cost
make this the first sentence in this comment


Line 211:    * threshold, or don't have a cost set. Also obey the 'materialize_sort' query
option,
no more hint, right?


Line 214:    * Returns an ExprSubstitutionMap from the new SlotRefs to the original exprs.
from the materialized sort exprs to the new SlotRefs


Line 216:   public ExprSubstitutionMap createMaterializedOrderExprs(
Instead of returning a new smap and combining with the other one, why not pass in the existing
smap and directly add entries to it? The 'sortTupleExprs' param is just the lhs of the existing
smap, so I think this interface can be made a little simpler / more direct.

Then you could use the return value to return the exprs that you chose to materialize. Fewer
side-effects are easier to reason about.


Line 217:       TupleDescriptor sortTupleDesc, List<Expr> sortTupleExprs, Analyzer analyzer)
{
analyzer needed?


Line 219:     //List<Expr> materializedOrderingExprs = Lists.newArrayList();
remove


Line 228:         // LiteralExprs don't affect the sort order. TODO: consider removing the
sort node
Why TODO? What happens when there are no sort exprs at all left?


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 374:   public void testSortMaterialization() {
testSortExprMaterialization


http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test
File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test:

Line 19: # literal order expr and cheap order expr don't get materialized
I think these tests need more examples cheap and expensive exprs to make sure we mostly do
the right thing. For example, there is no test with a naked SlotRef. I suggest cramming more
exprs into each individual test case.

We also need to test more than 2 sort exprs.

Also: Needs to test UDFs and ideally also UDAs (inside the ORDER BY of an analytic function).
Let's talk if this is tricky to do with the current test infra.


Line 37: select last_value(id) over (order by if(int_col = 0, id, int_col), bool_col) from
functional.alltypes
Can you ask Greg about some specific expensive ordering exprs that have come up recently?
We should make sure that we do the right thing for them.


Line 106: |  materialized: log2(id)
It doesn't seem right to print this here because the merging exchange does not actually materialize
the expr. It relies on the preceding Top-N/Sort whatever to do that.


http://gerrit.cloudera.org:8080/#/c/6322/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

Line 165:     # "order by random()" should return results in a different order from unordered.
Confusing sentence, what is 'unordered'?
I think you can remove this and the next two lines


Line 168:     results = self.execute_query("select id from functional.alltypestiny")
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/6322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message