impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Date Fri, 10 Mar 2017 03:10:55 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
......................................................................


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS6, Line 446: TupleRow* current_row = batch->GetRow(i);
             :       uint32_t partition;
             :       RETURN_IF_ERROR(partitioner_->Partition(current_row, &partition));
             :       RETURN_IF_ERROR(channels_[partition % num_channels]->AddRow(current_row));
As we discussed, I think Henry is right that we need to avoid the v-f-calls per row.

I think we can consider two approaches:
1) we can continue branching separately for kudu and for hash partitioning here, and calling
non virtual fns on the partitioner. It's still good to have pulled all the Kudu code into
a separate class.
2) the partitioner class you added can take row batches at a time as well as well as the channels_,
and handle the per-row calls internally.

#2 sounds cleaner but I'm open to suggestions.


http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh
File bin/impala-config.sh:

PS6, Line 124: 4cd7d85
btw this won't be in your change, right? we still need to get the Kudu API in


http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift
File common/thrift/Partitions.thrift:

PS6, Line 42: TKuduDataPartition
brief comment


PS6, Line 43: target_table_id
mention this is necessary for the kudu Partitioner API


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 683:     // Hdfs folder structure correctly.
, or the indexes to construct rows passed to the Kudu partitioning API.

(Or something similar)


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS6, Line 46: partition
partitioning


PS6, Line 52: number
index


Line 55:   private DataPartition(
nit: comment should be called only by the static factory method for Kudu partitioned tables


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS6, Line 274:     List<Expr> partitionExprs = Lists.newArrayList(modifyStmt.getPartitionExprs());
             :     List<Integer> targetColIdxs = Lists.newArrayList(modifyStmt.getTargetColIdxs());
maybe just create these inline as params to the static constructor method below, since they're
not needed otherwise.


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 153:     } else {
a brief comment here may be helpful


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test:

PS6, Line 64: [KUDU(a.id)]
seems to be printing the alias rather than the base table name as we see in the line above


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

PS6, Line 29: 01:EXCHANGE [KUDU(1)]
hm this looks fishy with unions. i'm not sure if the approach doesn't make sense with unions,
at least those with constant operands.

can you add a test case which has unions with selecting from a table, e.g.

upsert into foo
select x,y,z from a union all
select x,y,z from b


PS6, Line 117: upsert into functional_kudu.testtbl /*+ clustered */
             : select * from functional_kudu.testtbl
             : ---- PLAN
             : UPSERT INTO KUDU [functional_kudu.testtbl]
             : |
             : 01:SORT
             : |  order by: id ASC NULLS LAST
             : |
             : 00:SCAN KUDU [functional_kudu.testtbl]
             : ---- DISTRIBUTEDPLAN
             : UPSERT INTO KUDU [functional_kudu.testtbl]
             : |
             : 02:SORT
             : |  order by: id DESC NULLS LAST
             : |
             : 01:EXCHANGE [KUDU(functional_kudu.testtbl.id)]
             : |
             : 00:SCAN KUDU [functional_kudu.testtbl]
we'll wanna do something similar without the clustered hint later, as we discussed


-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message