Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3A4FE200C6A for ; Wed, 19 Apr 2017 19:57:37 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 38F22160B9C; Wed, 19 Apr 2017 17:57:37 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7150F160B94 for ; Wed, 19 Apr 2017 19:57:36 +0200 (CEST) Received: (qmail 50310 invoked by uid 500); 19 Apr 2017 17:57:35 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 50298 invoked by uid 99); 19 Apr 2017 17:57:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Apr 2017 17:57:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id EBD26185E70 for ; Wed, 19 Apr 2017 17:57:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id xCry9-GnvsPW for ; Wed, 19 Apr 2017 17:57:33 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7D1455F1D5 for ; Wed, 19 Apr 2017 17:57:33 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v3JHvWZj023473; Wed, 19 Apr 2017 17:57:32 GMT Message-Id: <201704191757.v3JHvWZj023473@ip-10-146-233-104.ec2.internal> Date: Wed, 19 Apr 2017 17:57:32 +0000 From: "Thomas Tauber-Marshall (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Matthew Jacobs , Dimitris Tsirogiannis , Marcel Kornacker , Mostafa Mokhtar Reply-To: tmarshall@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3742=3A_Partitions_and_sort_INSERTs_for_Kudu_tables=0A?= X-Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 X-Gerrit-ChangeURL: X-Gerrit-Commit: 5cdfce312488fa694d35704d09ef831a1c8a9511 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 19 Apr 2017 17:57:37 -0000 Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ...................................................................... Patch Set 4: As requested, I ran some tests with tpcds_1000_text.store_sales: with patch: 3755.68s with patch and noshuffle: 3778.62s without patch: 3571.33s The first two are averages over three runs, for the third it only passed twice out of 10 runs. I also tried doing "with patch and noclustered" but it timed out every time I ran it. So again we see here a ~10% perf cost in exchange for consistently running to completion. These results also suggest that its the pre-sorting, not the pre-partitioning, that's making the biggest difference in whether or not Kudu can handle the load. We may be able to get more impact from the partitioning if we start partitioning the rows to the impalad collocated with the corresponding tserver, though that's going to take some changes to the scheduler and we're leaving it as future work. As far as adding the ability to turn this off: the current version of the patch just disables insert hints for Kudu tables, as none of the existing hints seem to make sense (would people need to specify both 'noshuffle' and 'noclustered' to turn this off?). If anyone feels that its important to have an off switch, I'm happy to include one (possibly with a new insert hint?), but I'm not sure how much of a benefit that would be vs. the added complexity of more knobs, given the above perf numbers and the fact that the partitioning and sorting has to happen either way, its just a question of whether its happing in Impala or in Kudu. Its also already the case that the exchange and sort won't happen for single node plans as determined by exec_single_node_rows_threshold, so for example small VALUES inserts will be unaffected, and we have as future work to examine and improve this decision, such as by not adding the exchange if the input is already partitioned correctly. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No