Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 825A1200D3C for ; Tue, 31 Oct 2017 00:19:13 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 81033160BF8; Mon, 30 Oct 2017 23:19:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C699A160BE4 for ; Tue, 31 Oct 2017 00:19:12 +0100 (CET) Received: (qmail 19858 invoked by uid 500); 30 Oct 2017 23:19:11 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 19847 invoked by uid 99); 30 Oct 2017 23:19:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Oct 2017 23:19:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A1EEE180416; Mon, 30 Oct 2017 23:19:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.5 X-Spam-Level: *** X-Spam-Status: No, score=3.5 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, KAM_NUMSUBJECT=0.5, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id TOvJ06ItSr9D; Mon, 30 Oct 2017 23:19:09 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 57DFD5FCE9; Mon, 30 Oct 2017 23:19:08 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C5529E0AA3; Mon, 30 Oct 2017 23:19:06 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id 4A195C416F5; Mon, 30 Oct 2017 23:19:04 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6044340823021836831==" MIME-Version: 1.0 Subject: Re: Review Request 63427: HIVE-17396 From: Jason Dere To: Ashutosh Chauhan , Jason Dere Cc: Deepak Jaiswal , hive Date: Mon, 30 Oct 2017 23:19:04 -0000 Message-ID: <20171030231904.50427.72022@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Jason Dere X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/63427/ X-Sender: Jason Dere References: <20171030190956.50427.24788@reviews-vm2.apache.org> In-Reply-To: <20171030190956.50427.24788@reviews-vm2.apache.org> Reply-To: Jason Dere X-ReviewRequest-Repository: hive-git archived-at: Mon, 30 Oct 2017 23:19:13 -0000 --===============6044340823021836831== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63427/#review189657 ----------------------------------------------------------- ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Line 1417 (original), 1417 (patched) Trying to think about how to think about this setting if we're going to use this for tuning. I think a better way of being able to think about this setting is, what kind of selectivity we want from the semijoin reduction before we decide it is worth keeping. For me this setting might be a bit more intuitive (basically float value between 0-1) - for example setting config to 0.5, compared to what you have now, where I think you would set it to 2.0. ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1419 (patched) Some concerns about long-to-float conversion going on here .. nDVs is cast to float then multiplied to 1.0, and nDVsOfTS is also converted to float during this comparison. This could affect the comparisoin results. Maybe cast nDVsFactored to long when doing the comparision to be safe? ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1421 (patched) If you are logging here, mention that setShouldRemove is being set. Would be more useful for someone looking at the logs. - Jason Dere On Oct. 30, 2017, 7:09 p.m., Deepak Jaiswal wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63427/ > ----------------------------------------------------------- > > (Updated Oct. 30, 2017, 7:09 p.m.) > > > Review request for hive, Ashutosh Chauhan and Jason Dere. > > > Repository: hive-git > > > Description > ------- > > Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin branches > > In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not have equality as there is a chance that the values are same on both sides and the branch is still marked as good when it shouldn't be. > Add a configurable factor to see how useful this is if nDVs on smaller side are only slightly less than that on TS side. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 > ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 1a1a4d9b2d > > > Diff: https://reviews.apache.org/r/63427/diff/1/ > > > Testing > ------- > > > Thanks, > > Deepak Jaiswal > > --===============6044340823021836831==--