impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Date Tue, 18 Oct 2016 23:00:32 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2521: Add clustered hint to insert statements

Patch Set 2:

Commit Message:

Line 11: distributed plan, just before the table sink. This has the effect that
-distributed (also adds to the single-node plan)
File fe/src/main/java/org/apache/impala/analysis/

Line 238:    * TODO: We could do something more sophisticated than simply copying input
move this TODO into SortInfo.createSortTupleInfo()

Line 253:     Set<SlotRef> sourceSlots = Sets.newHashSet();
these few lines can also be moved into createSortTupleInfo() right?
File fe/src/main/java/org/apache/impala/analysis/

Line 125:    * the ordering exprs refer to the materialized tuples instead of the original
refer to the new tuple materialized by this sort instead of the original input.

Line 145:    * output by the exec node implementing the sort. Done by materializing slot refs
in the
shorter: by the sort node

Line 146:    * order-by and result expressions. Those slot refs in the ordering and result
exprs are
comment about result expressions only makes sense if we pass in the result expressions to
this function, otherwise it's not really clear where those come from

Line 147:    * substituted with slot refs into the new tuple. This simplifies sorting logic
we also do this for top-n

Line 152:   public void createSortTupleInfo(Set<SlotRef> sourceSlots, boolean materializeSlots,
StatementBase stmt, Analyzer analyzer) {
I'd suggest removing the 'stmt' parameter and instead returning the ExprEubstitutionMap. Then
the stmt callers can apply that smap to whatever exprs they need to change.

Line 153:     // The tuple descriptor for the sort input. It will contain the materialized
Isn't this the tuple descriptor for the sort output? The secnd sentence is a little imprecise
because ("it contains") because a tuple descriptor does not contain materialized tuples.

Line 163:     // the internal rows of the sort node.
builds up the tuple operated on and returned by sort node

Line 171:       // TODO: I copied this from and would like to add an explanatory
I think the else case may not be needed/possible, but the explanation is that it's ok to move
predicates outside of this query block, but it's not ok to move predicates in if there is
a limit. Consider this query:

select count(*) from t1 inner join
(select id, name from t2
 where id < 10
 order by limit 10) v
on ( = and =
where = 'test'

In the example above it's ok to generate ' < 10' and put it into the scan of t1, but
it's not ok to generate ' = 'test'' and move it into the scan of t2.

Hope this makes sense.

Line 173:       if (stmt.hasLimit()) {
Can we also move this to the caller? The returned smap can be used to create this. The reason
is that it doesn't really make sense to add these value transfers during plan generation (for
the clustered hint use case).

Line 181:     if (materializeSlots) {
let's let the caller make this decision without passing a flag
File fe/src/main/java/org/apache/impala/planner/

Line 149:       // Add optional clustering node to the plan, based on clustered/noclustered
Let's keep it concrete: we're adding a sort node

Line 497:    * Insert a sort node into the plan, depending on the clustered/noclustered plan
Mention the ordering of the sort

Line 498:    * This will re-order the data produced by 'inputFragment' in a way that partitions
Second sentence is kind of vague. I understand that sorting is just an instance of a more
general clustering concept, but let's stick to what we are actually doing here :)

Line 503:     if (!insertStmt.hasClusteredHint()) {
single line (and elsewhere)

Line 508:     // Ignore constants for the sake of partitioning.
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:

Line 573: # IMPALA-2521: Test to make sure that clustering adds a sort node into the plan.
Let's test these combinations:
1. Partitioned tables
1.1. With clustered hint
1.2. With clustered and noshuffle hint
2. Unpartitioned tables
2.1. With clustered hint
2.2. With clustered and shuffle hint

Line 575: select * from functional.alltypes
Let's also test Kudu tables because there we want to sort on the PKs. Let's add those tests
to a Kudu-specific .test file

also add ---- PLAN section

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-HasComments: Yes

View raw message