impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Date Mon, 13 Mar 2017 19:24:45 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
......................................................................


Patch Set 7:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS6, Line 446:   batch, [this, num_channels](TupleRow* row, uint32_t partition) -> Status
{
             :           return this->channels_[partition % num_channels]->AddRow(row);
             :         }));
             :   }
> nice. I assume this generates a function which is then passed directly by f
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh
File bin/impala-config.sh:

PS6, Line 124: 4cd7d85
> btw this won't be in your change, right? we still need to get the Kudu API 
Yes, this isn't a real version of Kudu, just what resulted when I cherry-picked the Kudu API
onto the latest version. Its just here to mind me that this change can't go in until it goes
in on the Kudu side.


http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift
File common/thrift/Partitions.thrift:

PS6, Line 42: s partition inform
> brief comment
Done


PS6, Line 43: 
> mention this is necessary for the kudu Partitioner API
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 683:     // create the corresponding Hdfs folder structure correctly, or the indexes
to
> , or the indexes to construct rows passed to the Kudu partitioning API.
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS6, Line 46: partition
> partitioning
Done


PS6, Line 52: index 
> index
Done


Line 55:   // Should be called only by the static factory method for Kudu partitioned tables.
> nit: comment should be called only by the static factory method for Kudu pa
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS6, Line 274:     ExchangeNode exchNode =
             :         new ExchangeNode(ctx_.getNextNodeId(), inputFragment.getPlanRoot());
> maybe just create these inline as params to the static constructor method b
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 153:     } else {
> a brief comment here may be helpful
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test:

PS6, Line 64: [KUDU(a.id)]
> seems to be printing the alias rather than the base table name as we see in
What should the behavior be? There isn't an alias to use in the above queries, but it seems
more convenient to use the alias here, like everywhere else this value is referred to in this
plan.


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

PS6, Line 29: 01:EXCHANGE [KUDU(1)]
> hm this looks fishy with unions. i'm not sure if the approach doesn't make 
The '1' here is actually a SlotRef referring to the column the "1" is going into, id, so this
basically works as intended, at least in theory Its printed as "1" because that's what gets
set as the SlotRefs label. Obviously it would be preferable for it to be printed as the underlying
column, but I'm not sure of a non-hacky way to do that (i.e. not just special casing this
specific situation).

In practice, this doesn't have any effect, as 'insert into... values...' will always operate
on one node and DataStreamSender will use broadcast partitioning.

Non-Kudu inserts avoid this with the cost-based decision in DistributedPlanner, which we could
also do here, though as noted elsewhere in this review, this only appears here because PlannerTest
sets exec_single_node_rows_threshold to 0, so in practice small values inserts won't be affected
by this.


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 53: 01:EXCHANGE [KUDU(10)]
> we were talking about trying to figure out the right cost-based logic separ
This turns out to be an artifact of the way PlannerTest works - it always sets exec_single_node_rows_threshold
to 0. If you left it as the default 100 this exchange would not be here, so I think this is
the correct behavior, and small queries should be fine.

If you happen to look in insert.test and notice that similar INSERTs with small VALUEs clauses
in there don't have an exchange in their DISTRIBUTEDPLANs, thats due to a more complicated
cost based decision in DistributedPlanner, and we were planning on punting on doing the same
thing for Kudu for now (there's a TODO).


Line 248: # Kudu planner tests
> i'm assuming the hint will lose its meaning once the default sorting is in 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message