Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 40D71200CBC for ; Tue, 20 Jun 2017 21:02:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3F12F160BCC; Tue, 20 Jun 2017 19:02:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 877DA160BE1 for ; Tue, 20 Jun 2017 21:02:15 +0200 (CEST) Received: (qmail 40911 invoked by uid 500); 20 Jun 2017 19:02:14 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 40881 invoked by uid 99); 20 Jun 2017 19:02:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jun 2017 19:02:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CFD19C1717 for ; Tue, 20 Jun 2017 19:02:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 2-nawvhHv51c for ; Tue, 20 Jun 2017 19:02:11 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 09D405F5A2 for ; Tue, 20 Jun 2017 19:02:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5KJ29Kk027860; Tue, 20 Jun 2017 19:02:09 GMT Message-Id: <201706201902.v5KJ29Kk027860@ip-10-146-233-104.ec2.internal> Date: Tue, 20 Jun 2017 19:02:09 +0000 From: "anujphadke (Code Review)" To: anujphadke , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Matthew Jacobs , Dan Hecht , Tim Armstrong Reply-To: aphadke@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4866=3A_Hash_join_node_does_not_apply_limits_correctly=0A?= X-Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f X-Gerrit-ChangeURL: X-Gerrit-Commit: c7f1b84f2ec82b72227b61feab88338fd069013a In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Tue, 20 Jun 2017 19:02:16 -0000 anujphadke has posted comments on this change. Change subject: IMPALA-4866: Hash join node does not apply limits correctly ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/6778/2/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 582: // Try to continue from the current probe side input. > Maybe we should set the counter at the bottom of GetNext(), instead of in t Done Line 642: // Truncate the row batch if we went over the limit. > There's a bug if the hash join node is in a subplan - SubplanNode may call Done Line 643: num_rows_added = limit_ - num_rows_returned_; > Shouldn't we decrement 'num_rows_returned_' if we truncated the batch? Othe Done Line 806: if (matched_null_probe_[null_probe_output_idx_]) continue; > Nit: we usually write this as: With the current change, I dont think num_rows_returned should be updated elsewhere except at the end of GetNext before setting the counter. line#651 If it gets incremented and exceeds the limit, num_rows_returned could end up being a -ve value. line#644 num_rows_added = limit_ - num_rows_returned_; DCHECK_GE(num_rows_added, 0); Line 944: TupleRow* out_row = out_batch->GetRow(out_batch->AddRow()); > Nit: we usually write this as: With the current change, I dont think num_rows_returned should be updated elsewhere except at the end of GetNext before setting the counter. line#651 If it gets incremented and exceeds the limit, num_rows_returned could end up being a -ve value. line#644 num_rows_added = limit_ - num_rows_returned_; DCHECK_GE(num_rows_added, 0); http://gerrit.cloudera.org:8080/#/c/6778/2/testdata/workloads/functional-query/queries/QueryTest/single-node-joins-with-limits.test File testdata/workloads/functional-query/queries/QueryTest/single-node-joins-with-limits.test: > How long do these take to run? Should they be under exhaustive? Moved them under exhaustive tests. -- To view, visit http://gerrit.cloudera.org:8080/6778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes