impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables
Date Thu, 16 Feb 2017 16:57:55 GMT
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6037

Change subject: PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables
......................................................................

PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables

Bulk inserts into Kudu are currently painful because we
just send rows randomly, which creates a lot of work for
Kudu since it partitions and sorts data before writing,
causing writes to be slow.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions the rows to
insert according to Kudu's partitioning scheme. A followup
patch will deal with sorting.

It accomplishes this by inserting an exchange node into the
plan before the insert and then passing down the TableId for
the target table to the DataStreamSender so that it can call
into the Kudu client to determine the partition for each row.

This patch is a PREVIEW so we can decide if we're happy with
the partitioning API Kudu has proposed and get that in on
the Kudu side. It does not have any tests, and has not been
tested for performance.

Its been suggested that rather than adding another special
case partitioning type to DataStreamSender we could make it
more general by passing in a partitioning function. I'm
currently investigating this.

Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
---
M be/src/runtime/coordinator.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/simple-scheduler.cc
M bin/impala-config.sh
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
11 files changed, 149 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>

Mime
View raw message