impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Date Wed, 05 Apr 2017 19:36:25 GMT
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6559

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
......................................................................

IMPALA-3742: Partitions and sort INSERTs for Kudu tables

Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow
and leading to timeouts.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions and sorts
rows according to Kudu's partitioning scheme for INSERTs
and UPSERTs. A followup patch will handle UPDATE and DELETE.

It accomplishes this by inserting an exchange node and a sort
node into the plan before the operation. Both the exchange and
the sort are given a KuduPartitionExpr which takes a row and
calls into the Kudu client to return its partition number.

Testing:
- Updated planner tests.
- Ran the Kudu functional tests.
- Ran performance tests demonstrating that we can now handle much
  larger inserts without having timeouts.

Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-context.h
M be/src/exprs/expr.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/scheduler.cc
M bin/impala-config.sh
M common/thrift/Exprs.thrift
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
21 files changed, 312 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>

Mime
View raw message