impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Date Wed, 29 Mar 2017 16:10:11 GMT
Thomas Tauber-Marshall has uploaded a new patch set (#10).

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
......................................................................

IMPALA-3742: partitions DMLs for Kudu tables

Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow.
We can alleviate this by sending the rows to Kudu already
partitioned and sorted.

This patch accomplishes this by inserting an exchange node into
the plan before the DML operation. The DataStreamSender then uses
a new abstraction, DataStreamPartitioner, that calls into the
Kudu client to determine the partition for each row. The partition
number is then appended to the row that is sent by the
DataStreamSender.

It also adds a sort node after the exchange node, which sorts on
the new tuple containing the partition number and then on the
key columns of the table.

Testing:
- Updated planner tests.
- Manually verified the partitioning works as expected.

Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator.cc
A be/src/runtime/data-stream-partitioner.cc
A be/src/runtime/data-stream-partitioner.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/scheduler.cc
M bin/impala-config.sh
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
26 files changed, 665 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>

Mime
View raw message