impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Date Wed, 19 Apr 2017 17:57:32 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
......................................................................


Patch Set 4:

As requested, I ran some tests with tpcds_1000_text.store_sales:

with patch: 3755.68s
with patch and noshuffle: 3778.62s
without patch: 3571.33s

The first two are averages over three runs, for the third it only passed twice out of 10 runs.
I also tried doing "with patch and noclustered" but it timed out every time I ran it.

So again we see here a ~10% perf cost in exchange for consistently running to completion.

These results also suggest that its the pre-sorting, not the pre-partitioning, that's making
the biggest difference in whether or not Kudu can handle the load. We may be able to get more
impact from the partitioning if we start partitioning the rows to the impalad collocated with
the corresponding tserver, though that's going to take some changes to the scheduler and we're
leaving it as future work.

As far as adding the ability to turn this off: the current version of the patch just disables
insert hints for Kudu tables, as none of the existing hints seem to make sense (would people
need to specify both 'noshuffle' and 'noclustered' to turn this off?).

If anyone feels that its important to have an off switch, I'm happy to include one (possibly
with a new insert hint?), but I'm not sure how much of a benefit that would be vs. the added
complexity of more knobs, given the above perf numbers and the fact that the partitioning
and sorting has to happen either way, its just a question of whether its happing in Impala
or in Kudu.

Its also already the case that the exchange and sort won't happen for single node plans as
determined by exec_single_node_rows_threshold, so for example small VALUES inserts will be
unaffected, and we have as future work to examine and improve this decision, such as by not
adding the exchange if the input is already partitioned correctly.

-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message