impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Date Mon, 03 Apr 2017 22:10:37 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Patch Set 2:

File fe/src/main/java/org/apache/impala/analysis/

Line 292:     // We currently don't have a way to indicate if a UDF is deterministic, so just
> Not sure what you mean by 'two separate members". Maybe Expr.containsNonDet
Sorry for the confusion, let me try to clarify:

You added a new member Expr.isDeterministic_ which is set to false if the Expr is not deterministic
or a UDF. I'm saying we should not mix the UDF and deterministic concepts. One way to do that
is to have separate members like containsNonDeterministicBuiltin_ and containsUDF_. It might
be even easier to add a Predicate in Expr like IS_EQ_BINARY_PREDICATE and then use Expr.contains()
in the appropriate places.

I agree that FunctionCallExpr.isNondeterministicBuiltinFn() and Expr.isDeterministic() return
consistent values, but that seems confusing to be (they return consistent values for non-builtins
because we consider non-builtins to be non-deterministic)
File fe/src/main/java/org/apache/impala/analysis/

Line 228:    * exprs that get materialized.
> There's already a lot in this review, and I wasn't sure how easy this would
Good to know that the sort will work. I agree we should tackle this later, probably easiest
to just leave the literals alone for now.
File fe/src/main/java/org/apache/impala/analysis/

Line 51:   // Original Exprs for any ordering exprs that are materialized. Populated in
Subset of ordering exprs that are materialized. Populated ...

Line 166:    * output by the sort node. Done by materializing slot refs in the order-by and
update comment

Line 170:    * TODO: We could do something more sophisticated than simply copying input slot
refs -
TODO is addressed

Line 174:       List<Expr> resultExprs, Analyzer analyzer) {
do these fit into the previous line?

Line 180:     // substOrderBy is a mapping from slot refs in the sort node's input and ordering
Comment is confusing because of the original 'sort node's input' which is not clear.

I suggest rephrasing to be more explicit about what the left-hand side of the smap contains:
1. materialized ordering Exprs
2. SlotRefs of non-materialized ordering Exprs
3. SlotRefs of resultExprs after substituting materialized ordering exprs

I think point (3) is not yet reflected in the code, but we should do it imo.


select concat(a, b, c, d) as x from t order by x

The current code will materialize concat(), but also the SlotRefs a, b, c, d.

Line 192:     List<Expr> nonmaterializedOrderingExprs = Lists.newArrayList(orderingExprs_);

Line 204:     // The ordering exprs are still the original exprs and still point to the old
Suggest simplifying comment to something like:

The ordering exprs are evaluated against the sort tuple, so must reflect our materialization
decision above.

Line 218:    * threshold, or don't have a cost set, by creating SlotRefs for them with
shorter: by creating slots for them in 'sortTupleDesc'

(we are adding slots to the tuple, not SlotRefs)

Line 230:   public void createMaterializedOrderExprs(
Considering all my comments here and elsewhere I think the simplest interface is:

public ExprSubstitutionMap createMaterializedOrderExprs(TupleDescriptor sortTupleDesc, Analyzer

Line 240:       if (origOrderingExpr instanceof LiteralExpr) {
Thinking about this again: This behavior complicates the code and function comment above,
so I think it would be better to leave literals alone for now.

This is only a partial solution to the literals problem anyway. Consider something like:

select * from (select 1 as col from t) x order by col

We'll ultimately still sort on a literal, so maybe this is not the right place for this check.

Even further, I think this solution will not work through inline views at all, e.g.:

select * from (select uuid() as col from t) x order by col

Let's think about how to address this problem and whether we need to do it now or defer.
File testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test:

Line 1: # expensive order exprs get materialized, cheap ones don't
Tests should ideally cover the 3 cases:
1. non-deterministic exprs
2. exprs above/below the cost threshold
3. exprs with an unknown cost (do we have these or are they a "bug"?)

no need for this
File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test:

Line 19
> I'm not sure how to make that work with PlannerTest, but I added a test to 
I think you should be able to add a function using FrontendTestBase.addTestFunction() for
a planner test.

You are right about the UDA, not sure what I was thinking.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message