impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables
Date Thu, 02 Mar 2017 23:02:50 GMT
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables
......................................................................

PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables

Bulk inserts into Kudu are currently painful because we
just send rows randomly, which creates a lot of work for
Kudu since it partitions and sorts data before writing,
causing writes to be slow.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions the rows to
insert according to Kudu's partitioning scheme. A followup
patch will deal with sorting.

It accomplishes this by inserting an exchange node into the
plan before the insert. The DataStreamSender then uses a new
abstraction, DataStreamPartitioner, that calls into the Kudu
client to determine the partition for each row.

In the future, DataStreamPartitioner can be extended to
other partitioning types.

This patch is a PREVIEW so we can decide if we're happy with
the partitioning API Kudu has proposed and get that in on
the Kudu side. It does not have any tests, and has not been
tested for performance.

Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator.cc
A be/src/runtime/data-stream-partitioner.cc
A be/src/runtime/data-stream-partitioner.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/simple-scheduler.cc
M bin/impala-config.sh
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
17 files changed, 343 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>

Mime
View raw message