impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Date Thu, 13 Apr 2017 20:27:34 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables

Patch Set 4:

File be/src/exec/

Line 110: 
> can you add a TODO that this needs to be codegen'd ?
File be/src/exec/kudu-util.h:

PS3, Line 62: ry 
> into memory owned by the row. If false, string data must remain valid while
File be/src/exprs/

PS3, Line 55:   KuduClient* client;
> let's use the query state kudu client

Line 84: 
> comment here and above why we're dchecking s.ok()
File be/src/runtime/data-stream-sender.h:

PS3, Line 109: kudu_
> I think this may be better off avoiding mentioning kudu directly, e.g. call

PS3, Line 127: sink_partition_expr
> this needs explanation as well.
File common/thrift/Exprs.thrift:

Line 134:   2: required list<Types.TPrimitiveType> types
> nit newline after this
File fe/src/main/java/org/apache/impala/planner/

PS3, Line 219:  TODO: consider making a cost based decision for Kudu tables.
             :     if (!insertStmt.hasShuffleHint()
             :         && !(insertStmt.getTargetTable() instanceof KuduTable)
> we should address this before the release. it'd be good to start thinking a
That is already handled because we only call this function from Planner if '!ctx_.isSingleNodeExec()',
which is true if the max # of rows processed is < single_node_exec_rows_threshold, which
defaults to 100.

This doesn't show up in the DISTRIBUTEDPLAN in the planner tests because PlannerTest sets
single_node_exec_rows_threshold to 0.

For non-Kudu tables, this 'if' is about handling things like if the input is already partitioned
correctly, which we're leaving for future work for the Kudu case.
File fe/src/main/java/org/apache/impala/planner/

PS3, Line 540: putFragment.getPartitionExpr() != null) {
> I don't understand this logic about the clustered hint.
Sure, I'll disallow all insert hints for Kudu tables (we support CLUSTERED, SHUFFLE, and SORT
BY, none of which really apply now);

The point of checking isSingleNodeExec here is to avoid sorting if we haven't partitioned,
since then Kudu is just going to resort anyways, but if we're not also handling clustered
here, then its cleaner to just look for the presence of the partition expr.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Thomas Tauber-Marshall <>
Gerrit-HasComments: Yes

View raw message