impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Date Mon, 03 Apr 2017 22:10:37 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 2:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 292:     // We currently don't have a way to indicate if a UDF is deterministic, so just
always
> Not sure what you mean by 'two separate members". Maybe Expr.containsNonDet
Sorry for the confusion, let me try to clarify:

You added a new member Expr.isDeterministic_ which is set to false if the Expr is not deterministic
or a UDF. I'm saying we should not mix the UDF and deterministic concepts. One way to do that
is to have separate members like containsNonDeterministicBuiltin_ and containsUDF_. It might
be even easier to add a Predicate in Expr like IS_EQ_BINARY_PREDICATE and then use Expr.contains()
in the appropriate places.

I agree that FunctionCallExpr.isNondeterministicBuiltinFn() and Expr.isDeterministic() return
consistent values, but that seems confusing to be (they return consistent values for non-builtins
because we consider non-builtins to be non-deterministic)


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 228:    * exprs that get materialized.
> There's already a lot in this review, and I wasn't sure how easy this would
Good to know that the sort will work. I agree we should tackle this later, probably easiest
to just leave the literals alone for now.


http://gerrit.cloudera.org:8080/#/c/6322/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 51:   // Original Exprs for any ordering exprs that are materialized. Populated in
Subset of ordering exprs that are materialized. Populated ...


Line 166:    * output by the sort node. Done by materializing slot refs in the order-by and
given
update comment


Line 170:    * TODO: We could do something more sophisticated than simply copying input slot
refs -
TODO is addressed


Line 174:       List<Expr> resultExprs, Analyzer analyzer) {
do these fit into the previous line?


Line 180:     // substOrderBy is a mapping from slot refs in the sort node's input and ordering
Comment is confusing because of the original 'sort node's input' which is not clear.

I suggest rephrasing to be more explicit about what the left-hand side of the smap contains:
1. materialized ordering Exprs
2. SlotRefs of non-materialized ordering Exprs
3. SlotRefs of resultExprs after substituting materialized ordering exprs

I think point (3) is not yet reflected in the code, but we should do it imo.

Example:

select concat(a, b, c, d) as x from t order by x

The current code will materialize concat(), but also the SlotRefs a, b, c, d.


Line 192:     List<Expr> nonmaterializedOrderingExprs = Lists.newArrayList(orderingExprs_);
nonMaterializedOrderingExprs


Line 204:     // The ordering exprs are still the original exprs and still point to the old
slot
Suggest simplifying comment to something like:

The ordering exprs are evaluated against the sort tuple, so must reflect our materialization
decision above.


Line 218:    * threshold, or don't have a cost set, by creating SlotRefs for them with
shorter: by creating slots for them in 'sortTupleDesc'

(we are adding slots to the tuple, not SlotRefs)


Line 230:   public void createMaterializedOrderExprs(
Considering all my comments here and elsewhere I think the simplest interface is:

public ExprSubstitutionMap createMaterializedOrderExprs(TupleDescriptor sortTupleDesc, Analyzer
analyzer)


Line 240:       if (origOrderingExpr instanceof LiteralExpr) {
Thinking about this again: This behavior complicates the code and function comment above,
so I think it would be better to leave literals alone for now.

This is only a partial solution to the literals problem anyway. Consider something like:

select * from (select 1 as col from t) x order by col

We'll ultimately still sort on a literal, so maybe this is not the right place for this check.

Even further, I think this solution will not work through inline views at all, e.g.:

select * from (select uuid() as col from t) x order by col

Let's think about how to address this problem and whether we need to do it now or defer.


http://gerrit.cloudera.org:8080/#/c/6322/2/testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
File testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test:

Line 1: # expensive order exprs get materialized, cheap ones don't
Tests should ideally cover the 3 cases:
1. non-deterministic exprs
2. exprs above/below the cost threshold
3. exprs with an unknown cost (do we have these or are they a "bug"?)


Line 102: ---- DISTRIBUTEDPLAN
no need for this


http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test
File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test:

Line 19
> I'm not sure how to make that work with PlannerTest, but I added a test to 
I think you should be able to add a function using FrontendTestBase.addTestFunction() for
a planner test.

You are right about the UDA, not sure what I was thinking.


-- 
To view, visit http://gerrit.cloudera.org:8080/6322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message