Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7895B1050C for ; Fri, 12 Dec 2014 19:52:44 +0000 (UTC) Received: (qmail 60380 invoked by uid 500); 12 Dec 2014 19:52:44 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 60306 invoked by uid 500); 12 Dec 2014 19:52:44 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 60292 invoked by uid 99); 12 Dec 2014 19:52:43 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2014 19:52:43 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 6823D1D23A2; Fri, 12 Dec 2014 19:52:40 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2357199486606754891==" MIME-Version: 1.0 Subject: Re: Review Request 28889: HIVE-8911 - Enable mapjoin hints [Spark Branch] From: "Chao Sun" To: "Szehon Ho" , "Xuefu Zhang" Cc: "hive" , "Chao Sun" Date: Fri, 12 Dec 2014 19:52:40 -0000 Message-ID: <20141212195240.5765.99434@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Chao Sun" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/28889/ X-Sender: "Chao Sun" References: <20141212194517.26419.25811@reviews.apache.org> In-Reply-To: <20141212194517.26419.25811@reviews.apache.org> Reply-To: "Chao Sun" X-ReviewRequest-Repository: hive-git --===============2357199486606754891== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Dec. 12, 2014, 7:45 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java, line 78 > > > > > > nit: grandParentOps.get(0) is repeated in the next line. nice to have a var for it. Sure. Will fix. - Chao ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28889/#review64959 ----------------------------------------------------------- On Dec. 11, 2014, 10:36 p.m., Chao Sun wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28889/ > ----------------------------------------------------------- > > (Updated Dec. 11, 2014, 10:36 p.m.) > > > Review request for hive, Szehon Ho and Xuefu Zhang. > > > Bugs: HIVE-8911 > https://issues.apache.org/jira/browse/HIVE-8911 > > > Repository: hive-git > > > Description > ------- > > Basically the idea is to reuse as much code as possible, from MR. > > The issue is that, in MR's MapJoinProcessor, after join op is converted to mapjoin op, all the parents ReduceSinkOperators are removed. However, for our Spark branch, we need to preserve those, because they serve as boundaries between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them. > > Initially I tried to move this part of logic to SparkMapJoinOptimizer, which happens at a later stage. However, although this works, I'm worried it may have too much affect on the smb join w/ hint, because we then have to move that part of logic to SparkMapJoinOptimizer too. In general, I want to minimize the affect on code path. > > This patch make changes on MapJoinProcessor. I created a separate method convertMapJoinForSpark, which doesn't remove the > ReduceSinkOperators, for small tables. Then, in the transform method it decides which method to call based on the execution engine. > > I also have to disable several tests related to smb join w/ hints. They can be activated once HIVE-8640 is resolved. > > > Diffs > ----- > > data/conf/spark/hive-site.xml 44eac86 > itests/src/test/resources/testconfiguration.properties 2348e06 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 773c827 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a8a3d86 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java PRE-CREATION > ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 > ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b > ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 > ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 > ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 > ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 > ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 > ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef > ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 > ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e > ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 > ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 > ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c > ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 5ac3f4c > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out e4ff965 > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out fce5566 > ql/src/test/results/clientpositive/spark/join25.q.out 284c97d > ql/src/test/results/clientpositive/spark/join26.q.out e271184 > ql/src/test/results/clientpositive/spark/join27.q.out d31f29e > ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa > ql/src/test/results/clientpositive/spark/join36.q.out f1317ea > ql/src/test/results/clientpositive/spark/join37.q.out 448e983 > ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea > ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b > ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d > ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 > ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 > ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c > ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 3b80437 > ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 > ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b > ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be > > Diff: https://reviews.apache.org/r/28889/diff/ > > > Testing > ------- > > bucket_map_join_1.q > bucket_map_join_2.q > bucketmapjoin1.q > bucketmapjoin10.q > bucketmapjoin11.q > bucketmapjoin12.q > bucketmapjoin13.q > bucketmapjoin2.q > bucketmapjoin3.q > bucketmapjoin4.q > bucketmapjoin5.q > bucketmapjoin7.q > bucketmapjoin8.q > bucketmapjoin9.q > bucketmapjoin_negative.q > bucketmapjoin_negative2.q > column_access_stats.q > join25.q > join26.q > join27.q > join30.q > join36.q > join37.q > join38.q > join39.q > join40.q > join_empty.q > join_filters_overlap.q > join_map_ppr.q > mapjoin1.q > mapjoin_distinct.q > mapjoin_filter_onerjoin.q > mapjoin_hook.q > mapjoin_tester.q > semijoin.q > skewjoin.q > table_access_keys_stats.q > > > Thanks, > > Chao Sun > > --===============2357199486606754891==--